ISO 19115 Experiences in NASA's Earth Observing System (EOS) ClearingHOuse (ECHO)
NASA Astrophysics Data System (ADS)
Cechini, M. F.; Mitchell, A.
2011-12-01
Metadata is an important entity in the process of cataloging, discovering, and describing earth science data. As science research and the gathered data increases in complexity, so does the complexity and importance of descriptive metadata. To meet these growing needs, the metadata models required utilize richer and more mature metadata attributes. Categorizing, standardizing, and promulgating these metadata models to a politically, geographically, and scientifically diverse community is a difficult process. An integral component of metadata management within NASA's Earth Observing System Data and Information System (EOSDIS) is the Earth Observing System (EOS) ClearingHOuse (ECHO). ECHO is the core metadata repository for the EOSDIS data centers providing a centralized mechanism for metadata and data discovery and retrieval. ECHO has undertaken an internal restructuring to meet the changing needs of scientists, the consistent advancement in technology, and the advent of new standards such as ISO 19115. These improvements were based on the following tenets for data discovery and retrieval: + There exists a set of 'core' metadata fields recommended for data discovery. + There exists a set of users who will require the entire metadata record for advanced analysis. + There exists a set of users who will require a 'core' set metadata fields for discovery only. + There will never be a cessation of new formats or a total retirement of all old formats. + Users should be presented metadata in a consistent format of their choosing. In order to address the previously listed items, ECHO's new metadata processing paradigm utilizes the following approach: + Identify a cross-format set of 'core' metadata fields necessary for discovery. + Implement format-specific indexers to extract the 'core' metadata fields into an optimized query capability. + Archive the original metadata in its entirety for presentation to users requiring the full record. + Provide on-demand translation of 'core' metadata to any supported result format. Lessons learned by the ECHO team while implementing its new metadata approach to support usage of the ISO 19115 standard will be presented. These lessons learned highlight some discovered strengths and weaknesses in the ISO 19115 standard as it is introduced to an existing metadata processing system.
Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.
Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C
2018-01-17
Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.
Challenges to Standardization: A Case Study Using Coastal and Deep-Ocean Water Level Data
NASA Astrophysics Data System (ADS)
Sweeney, A. D.; Stroker, K. J.; Mungov, G.; McLean, S. J.
2015-12-01
Sea levels recorded at coastal stations and inferred from deep-ocean pressure observations at the seafloor are submitted for archive in multiple data and metadata formats. These formats include two forms of schema-less XML and a custom binary format accompanied by metadata in a spreadsheet. The authors report on efforts to use existing standards to make this data more discoverable and more useful beyond their initial use in detecting tsunamis. An initial review of data formats for sea level data around the globe revealed heterogeneity in presentation and content. In the absence of a widely-used domain-specific format, we adopted the general model for structuring data and metadata expressed by the Network Common Data Form (netCDF). netCDF has been endorsed by the Open Geospatial Consortium and has the advantages of small size when compared to equivalent plain text representation and provides a standard way of embedding metadata in the same file. We followed the orthogonal time-series profile of the Climate and Forecast discrete sampling geometries as the convention for structuring the data and describing metadata relevant for use. We adhered to the Attribute Convention for Data Discovery for capturing metadata to support user search. Beyond making it possible to structure data and metadata in a standard way, netCDF is supported by multiple software tools in providing programmatic cataloging, access, subsetting, and transformation to other formats. We will describe our successes and failures in adhering to existing standards and provide requirements for either augmenting existing conventions or developing new ones. Some of these enhancements are specific to sea level data, while others are applicable to time-series data in general.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-06-07
... Information Technology. SUMMARY: As part of the HHS Open Government Plan, the HealthData.gov Platform (HDP) is... application of existing voluntary consensus standards for metadata common to all open government data, and... vocabulary recommendations for Linked Data publishers, defining cross domain semantic metadata of open...
Emerging Network Storage Management Standards for Intelligent Data Storage Subsystems
NASA Technical Reports Server (NTRS)
Podio, Fernando; Vollrath, William; Williams, Joel; Kobler, Ben; Crouse, Don
1998-01-01
This paper discusses the need for intelligent storage devices and subsystems that can provide data integrity metadata, the content of the existing data integrity standard for optical disks and techniques and metadata to verify stored data on optical tapes developed by the Association for Information and Image Management (AIIM) Optical Tape Committee.
A Generic Metadata Editor Supporting System Using Drupal CMS
NASA Astrophysics Data System (ADS)
Pan, J.; Banks, N. G.; Leggott, M.
2011-12-01
Metadata handling is a key factor in preserving and reusing scientific data. In recent years, standardized structural metadata has become widely used in Geoscience communities. However, there exist many different standards in Geosciences, such as the current version of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC CSDGM), the Ecological Markup Language (EML), the Geography Markup Language (GML), and the emerging ISO 19115 and related standards. In addition, there are many different subsets within the Geoscience subdomain such as the Biological Profile of the FGDC (CSDGM), or for geopolitical regions, such as the European Profile or the North American Profile in the ISO standards. It is therefore desirable to have a software foundation to support metadata creation and editing for multiple standards and profiles, without re-inventing the wheels. We have developed a software module as a generic, flexible software system to do just that: to facilitate the support for multiple metadata standards and profiles. The software consists of a set of modules for the Drupal Content Management System (CMS), with minimal inter-dependencies to other Drupal modules. There are two steps in using the system's metadata functions. First, an administrator can use the system to design a user form, based on an XML schema and its instances. The form definition is named and stored in the Drupal database as a XML blob content. Second, users in an editor role can then use the persisted XML definition to render an actual metadata entry form, for creating or editing a metadata record. Behind the scenes, the form definition XML is transformed into a PHP array, which is then rendered via Drupal Form API. When the form is submitted the posted values are used to modify a metadata record. Drupal hooks can be used to perform custom processing on metadata record before and after submission. It is trivial to store the metadata record as an actual XML file or in a storage/archive system. We are working on adding many features to help editor users, such as auto completion, pre-populating of forms, partial saving, as well as automatic schema validation. In this presentation we will demonstrate a few sample editors, including an FGDC editor and a bare bone editor for ISO 19115/19139. We will also demonstrate the use of templates during the definition phase, with the support of export and import functions. Form pre-population and input validation will also be covered. Theses modules are available as open-source software from the Islandora software foundation, as a component of a larger Drupal-based data archive system. They can be easily installed as stand-alone system, or to be plugged into other existing metadata platforms.
NASA Astrophysics Data System (ADS)
Benedict, K. K.; Scott, S.
2013-12-01
While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into multiple data infrastructures, as has been demonstrated through EDAC's testing and deployment of metadata into multiple external systems: Data.Gov, the GEOSS Registry, the DataONE network, the DSpace based institutional repository at UNM and semantic mediation systems developed as part of the NASA ACCESS ELSeWEB project. Each of these systems requires valid metadata as a first step, but to make most effective use of the delivered metadata each also has a set of conventions that are specific to the system. This presentation will provide an overview of the underlying metadata management model, the processes and web services that have been developed to automatically generate metadata in a variety of standard formats and highlight some of the specific modifications made to the output metadata content to support the different conventions used by the multiple metadata integration endpoints.
A Metadata Standard for Hydroinformatic Data Conforming to International Standards
NASA Astrophysics Data System (ADS)
Notay, Vikram; Carstens, Georg; Lehfeldt, Rainer
2017-04-01
The affordable availability of computing power and digital storage has been a boon for the scientific community. The hydroinformatics community has also benefitted from the so-called digital revolution, which has enabled the tackling of more and more complex physical phenomena using hydroinformatic models, instruments, sensors, etc. With models getting more and more complex, computational domains getting larger and the resolution of computational grids and measurement data getting finer, a large amount of data is generated and consumed in any hydroinformatics related project. The ubiquitous availability of internet also contributes to this phenomenon with data being collected through sensor networks connected to telecommunications networks and the internet long before the term Internet of Things existed. Although generally good, this exponential increase in the number of available datasets gives rise to the need to describe this data in a standardised way to not only be able to get a quick overview about the data but to also facilitate interoperability of data from different sources. The Federal Waterways Engineering and Research Institute (BAW) is a federal authority of the German Federal Ministry of Transport and Digital Infrastructure. BAW acts as a consultant for the safe and efficient operation of the German waterways. As part of its consultation role, BAW operates a number of physical and numerical models for sections of inland and marine waterways. In order to uniformly describe the data produced and consumed by these models throughout BAW and to ensure interoperability with other federal and state institutes on the one hand and with EU countries on the other, a metadata profile for hydroinformatic data has been developed at BAW. The metadata profile is composed in its entirety using the ISO 19115 international standard for metadata related to geographic information. Due to the widespread use of the ISO 19115 standard in the existing geodata infrastructure worldwide, the profile provides a means to describe hydroinformatic data that conforms to existing metadata standards. Additionally, EU and German national standards, INSPIRE and GDI-DE have been considered to ensure interoperability on an international and national level. Finally, elements of the GovData profile of the Federal Government of Germany have been integrated to be able to participate in its Open Data initiative. All these factors make the metadata profile developed at BAW highly suitable for describing hydroinformatic data in particular and physical state variables in general. Further details about this metadata profile will be presented at the conference. Acknowledgements: The authors would like to thank Christoph Wosniok and Peter Schade for their contributions towards the development of this metadata standard.
The Metadata Cloud: The Last Piece of a Distributed Data System Model
NASA Astrophysics Data System (ADS)
King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.
2012-12-01
Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
XAFS Data Interchange: A single spectrum XAFS data file format.
Ravel, B; Newville, M
We propose a standard data format for the interchange of XAFS data. The XAFS Data Interchange (XDI) standard is meant to encapsulate a single spectrum of XAFS along with relevant metadata. XDI is a text-based format with a simple syntax which clearly delineates metadata from the data table in a way that is easily interpreted both by a computer and by a human. The metadata header is inspired by the format of an electronic mail header, representing metadata names and values as an associative array. The data table is represented as columns of numbers. This format can be imported as is into most existing XAFS data analysis, spreadsheet, or data visualization programs. Along with a specification and a dictionary of metadata types, we provide an application-programming interface written in C and bindings for programming dynamic languages.
XAFS Data Interchange: A single spectrum XAFS data file format
NASA Astrophysics Data System (ADS)
Ravel, B.; Newville, M.
2016-05-01
We propose a standard data format for the interchange of XAFS data. The XAFS Data Interchange (XDI) standard is meant to encapsulate a single spectrum of XAFS along with relevant metadata. XDI is a text-based format with a simple syntax which clearly delineates metadata from the data table in a way that is easily interpreted both by a computer and by a human. The metadata header is inspired by the format of an electronic mail header, representing metadata names and values as an associative array. The data table is represented as columns of numbers. This format can be imported as is into most existing XAFS data analysis, spreadsheet, or data visualization programs. Along with a specification and a dictionary of metadata types, we provide an application-programming interface written in C and bindings for programming dynamic languages.
NASA Astrophysics Data System (ADS)
Hills, S. J.; Richard, S. M.; Doniger, A.; Danko, D. M.; Derenthal, L.; Energistics Metadata Work Group
2011-12-01
A diverse group of organizations representative of the international community involved in disciplines relevant to the upstream petroleum industry, - energy companies, - suppliers and publishers of information to the energy industry, - vendors of software applications used by the industry, - partner government and academic organizations, has engaged in the Energy Industry Metadata Standards Initiative. This Initiative envisions the use of standard metadata within the community to enable significant improvements in the efficiency with which users discover, evaluate, and access distributed information resources. The metadata standard needed to realize this vision is the initiative's primary deliverable. In addition to developing the metadata standard, the initiative is promoting its adoption to accelerate realization of the vision, and publishing metadata exemplars conformant with the standard. Implementation of the standard by community members, in the form of published metadata which document the information resources each organization manages, will allow use of tools requiring consistent metadata for efficient discovery and evaluation of, and access to, information resources. While metadata are expected to be widely accessible, access to associated information resources may be more constrained. The initiative is being conducting by Energistics' Metadata Work Group, in collaboration with the USGIN Project. Energistics is a global standards group in the oil and natural gas industry. The Work Group determined early in the initiative, based on input solicited from 40+ organizations and on an assessment of existing metadata standards, to develop the target metadata standard as a profile of a revised version of ISO 19115, formally the "Energy Industry Profile of ISO/DIS 19115-1 v1.0" (EIP). The Work Group is participating on the ISO/TC 211 project team responsible for the revision of ISO 19115, now ready for "Draft International Standard" (DIS) status. With ISO 19115 an established, capability-rich, open standard for geographic metadata, EIP v1 is expected to be widely acceptable within the community and readily sustainable over the long-term. The EIP design, also per community requirements, will enable discovery, evaluation, and access to types of information resources considered important to the community, including structured and unstructured digital resources, and physical assets such as hardcopy documents and material samples. This presentation will briefly review the development of this initiative as well as the current and planned Work Group activities. More time will be spent providing an overview of the EIP v1, including the requirements it prescribes, design efforts made to enable automated metadata capture and processing, and the structure and content of its documentation, which was written to minimize ambiguity and facilitate implementation. The Work Group considers EIP v1 a solid initial design for interoperable metadata, and first step toward the vision of the Initiative.
Park, Yu Rang; Yoon, Young Jo; Kim, Hye Hyeon; Kim, Ju Han
2013-01-01
Achieving semantic interoperability is critical for biomedical data sharing between individuals, organizations and systems. The ISO/IEC 11179 MetaData Registry (MDR) standard has been recognized as one of the solutions for this purpose. The standard model, however, is limited. Representing concepts consist of two or more values, for instance, are not allowed including blood pressure with systolic and diastolic values. We addressed the structural limitations of ISO/IEC 11179 by an integrated metadata object model in our previous research. In the present study, we introduce semantic extensions for the model by defining three new types of semantic relationships; dependency, composite and variable relationships. To evaluate our extensions in a real world setting, we measured the efficiency of metadata reduction by means of mapping to existing others. We extracted metadata from the College of American Pathologist Cancer Protocols and then evaluated our extensions. With no semantic loss, one third of the extracted metadata could be successfully eliminated, suggesting better strategy for implementing clinical MDRs with improved efficiency and utility.
Towards Data Value-Level Metadata for Clinical Studies.
Zozus, Meredith Nahm; Bonner, Joseph
2017-01-01
While several standards for metadata describing clinical studies exist, comprehensive metadata to support traceability of data from clinical studies has not been articulated. We examine uses of metadata in clinical studies. We examine and enumerate seven sources of data value-level metadata in clinical studies inclusive of research designs across the spectrum of the National Institutes of Health definition of clinical research. The sources of metadata inform categorization in terms of metadata describing the origin of a data value, the definition of a data value, and operations to which the data value was subjected. The latter is further categorized into information about changes to a data value, movement of a data value, retrieval of a data value, and data quality checks, constraints or assessments to which the data value was subjected. The implications of tracking and managing data value-level metadata are explored.
Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Rzepa, Henry S
2015-01-01
We describe three different procedures based on metadata standards for enabling automated retrieval of scientific data from digital repositories utilising the persistent identifier of the dataset with optional specification of the attributes of the data document such as filename or media type. The procedures are demonstrated using the JSmol molecular visualizer as a component of a web page and Avogadro as a stand-alone modelling program. We compare our methods for automated retrieval of data from a standards-compliant data repository with those currently in operation for a selection of existing molecular databases and repositories. Our methods illustrate the importance of adopting a standards-based approach of using metadata declarations to increase access to and discoverability of repository-based data. Graphical abstract.
McMahon, Christiana; Denaxas, Spiros
2017-11-06
Informed consent is an important feature of longitudinal research studies as it enables the linking of the baseline participant information with administrative data. The lack of standardized models to capture consent elements can lead to substantial challenges. A structured approach to capturing consent-related metadata can address these. a) Explore the state-of-the-art for recording consent; b) Identify key elements of consent required for record linkage; and c) Create and evaluate a novel metadata management model to capture consent-related metadata. The main methodological components of our work were: a) a systematic literature review and qualitative analysis of consent forms; b) the development and evaluation of a novel metadata model. We qualitatively analyzed 61 manuscripts and 30 consent forms. We extracted data elements related to obtaining consent for linkage. We created a novel metadata management model for consent and evaluated it by comparison with the existing standards and by iteratively applying it to case studies. The developed model can facilitate the standardized recording of consent for linkage in longitudinal research studies and enable the linkage of external participant data. Furthermore, it can provide a structured way of recording consent-related metadata and facilitate the harmonization and streamlining of processes.
Metadata Design in the New PDS4 Standards - Something for Everybody
NASA Astrophysics Data System (ADS)
Raugh, Anne C.; Hughes, John S.
2015-11-01
The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data preparers.
Using RDF and Git to Realize a Collaborative Metadata Repository.
Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas
2018-01-01
The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.
Metadata and Service at the GFZ ISDC Portal
NASA Astrophysics Data System (ADS)
Ritschel, B.
2008-05-01
The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.
International Metadata Standards and Enterprise Data Quality Metadata Systems
NASA Astrophysics Data System (ADS)
Habermann, T.
2016-12-01
Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The "one size fits all" paradigm does not generally work well in this situation. The ISO data quality standard (ISO 19157) takes a different approach with the goal of systematically describing how data quality is measured rather than how it should be measured. It introduces the idea of standard data quality measures that can be well documented in a measure repository and used for consistently describing how data quality is measured across an enterprise. The standard includes recommendations for properties of these measures that include unique identifiers, references, illustrations and examples. Metadata records can reference these measures using the unique identifier and reuse them along with details (and references) that describe how the measure was applied to a particular dataset. A second important feature of ISO 19157 is the inclusion of citations to existing papers or reports that describe quality of a dataset. This capability allows users to find this information in a single location, i.e. the dataset metadata, rather than searching the web or other catalogs. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.
NOAA's Data Catalog and the Federal Open Data Policy
NASA Astrophysics Data System (ADS)
Wengren, M. J.; de la Beaujardiere, J.
2014-12-01
The 2013 Open Data Policy Presidential Directive requires Federal agencies to create and maintain a 'public data listing' that includes all agency data that is currently or will be made publicly-available in the future. The directive requires the use of machine-readable and open formats that make use of 'common core' and extensible metadata formats according to the best practices published in an online repository called 'Project Open Data', to use open licenses where possible, and to adhere to existing metadata and other technology standards to promote interoperability. In order to meet the requirements of the Open Data Policy, the National Oceanic and Atmospheric Administration (NOAA) has implemented an online data catalog that combines metadata from all subsidiary NOAA metadata catalogs into a single master inventory. The NOAA Data Catalog is available to the public for search and discovery, providing access to the NOAA master data inventory through multiple means, including web-based text search, OGC CS-W endpoint, as well as a native Application Programming Interface (API) for programmatic query. It generates on a daily basis the Project Open Data JavaScript Object Notation (JSON) file required for compliance with the Presidential directive. The Data Catalog is based on the open source Comprehensive Knowledge Archive Network (CKAN) software and runs on the Amazon Federal GeoCloud. This presentation will cover topics including mappings of existing metadata in standard formats (FGDC-CSDGM and ISO 19115 XML ) to the Project Open Data JSON metadata schema, representation of metadata elements within the catalog, and compatible metadata sources used to feed the catalog to include Web Accessible Folder (WAF), Catalog Services for the Web (CS-W), and Esri ArcGIS.com. It will also discuss related open source technologies that can be used together to build a spatial data infrastructure compliant with the Open Data Policy.
ERDDAP: Reducing Data Friction with an Open Source Data Platform
NASA Astrophysics Data System (ADS)
O'Brien, K.
2017-12-01
Data friction is not just an issue facing interdisciplinary research. Often times, even within disciplines, significant data friction can exist. Issues of differing formats, limited metadata and non-existent machine-to-machine data access are all issues that exist within disciplines and make it that much harder for successful interdisciplinary cooperation. Therefore, reducing data friction within disciplines is crucial first step in providing better overall collaboration. ERDDAP, an open source data platform developed at NOAA's Southwest Fisheries Center, is well poised to improve data useability and understanding and reduce data friction, both in single and multi-disciplinary research. By virtue of its ability to integrate data of varying formats and provide RESTful-based user access to data and metadata, use of ERDDAP has grown substantially throughout the ocean data community. ERDDAP also supports standards such as the DAP data protocol, the Climate and Forecast (CF) metadata conventions and the Bagit document standard for data archival. In this presentation, we will discuss the advantages of using ERDDAP as a data platform. We will also show specific use cases where utilizing ERDDAP has reduced friction within a single discipline (physical oceanography) and improved interdisciplinary collaboration as well.
The Road to Independently Understandable Information
NASA Astrophysics Data System (ADS)
Habermann, T.; Robinson, E.
2017-12-01
The turn of the 21st century was a pivotal time in the Earth and Space Science information ecosystem. The Content Standard for Digital Geospatial Metadata (CSDGM) had existed for nearly a decade and ambitious new standards were just emerging. The U.S. Federal Geospatial Data Committee (FGDC) had extended many of the concepts from CSDGM into the International community with ISO 19115:2003 and the Consultative Committee for Space Data Systems (CCSDS) had migrated their Open Archival Information System (OAIS) Reference Model into an international standard (ISO 14721:2003). The OAIS model outlined the roles and responsibilities of archives with the principle role being preserving information and making it available to users, a "designated community", as a service to the data producer. It was mandatory for the archive to ensure that information is "independently understandable" to the designated community and to maintain that understanding through on-going partnerships between archives and designated communities. Standards can play a role in supporting these partnerships as designated communities expand across disciplinary and geographic boundaries. The ISO metadata standards include many capabilities that might make critical contributions to this goal. These include connections to resources outside of the metadata record (i.e. documentation) and mechanisms for ongoing incorporation of user feedback into the metadata stream. We will demonstrate these capabilities with examples of how they can increase understanding.
ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data
NASA Astrophysics Data System (ADS)
Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.
2014-12-01
Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations
Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley; ...
2017-06-20
Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley
Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
NASA Astrophysics Data System (ADS)
Riddick, Andrew; Hughes, Andrew; Harpham, Quillon; Royse, Katherine; Singh, Anubha
2014-05-01
There has been an increasing interest both from academic and commercial organisations over recent years in developing hydrologic and other environmental models in response to some of the major challenges facing the environment, for example environmental change and its effects and ensuring water resource security. This has resulted in a significant investment in modelling by many organisations both in terms of financial resources and intellectual capital. To capitalise on the effort on producing models, then it is necessary for the models to be both discoverable and appropriately described. If this is not undertaken then the effort in producing the models will be wasted. However, whilst there are some recognised metadata standards relating to datasets these may not completely address the needs of modellers regarding input data for example. Also there appears to be a lack of metadata schemes configured to encourage the discovery and re-use of the models themselves. The lack of an established standard for model metadata is considered to be a factor inhibiting the more widespread use of environmental models particularly the use of linked model compositions which fuse together hydrologic models with models from other environmental disciplines. This poster presents the results of a Natural Environment Research Council (NERC) funded scoping study to understand the requirements of modellers and other end users for metadata about data and models. A user consultation exercise using an on-line questionnaire has been undertaken to capture the views of a wide spectrum of stakeholders on how they are currently managing metadata for modelling. This has provided a strong confirmation of our original supposition that there is a lack of systems and facilities to capture metadata about models. A number of specific gaps in current provision for data and model metadata were also identified, including a need for a standard means to record detailed information about the modelling environment and the model code used, to assist the selection of models for linked compositions. Existing best practice, including the use of current metadata standards (e.g. ISO 19110, ISO 19115 and ISO 19119) and the metadata components of WaterML were also evaluated. In addition to commonly used metadata attributes (e.g. spatial reference information) there was significant interest in recording a variety of additional metadata attributes. These included more detailed information about temporal data, and also providing estimates of data accuracy and uncertainty within metadata. This poster describes the key results of this study, including a number of gaps in the provision of metadata for modelling, and outlines how these might be addressed. Overall the scoping study has highlighted significant interest in addressing this issue within the environmental modelling community. There is therefore an impetus for on-going research, and we are seeking to take this forward through collaboration with other interested organisations. Progress towards an internationally recognised model metadata standard is suggested.
The Development of the Learning Object Standard Using a Pedagogic Approach: A Comparative Study.
ERIC Educational Resources Information Center
Yahya, Yazrina; Jenkins, John; Yusoff, Mohammed
Education is moving towards revenue generation from such channels as electronic learning, distance learning and virtual education. Hence learning technology standards are critical to the sector's success. Existing learning technology standards have focused on various topics such as metadata, question and test interoperability and others. However,…
NASA Astrophysics Data System (ADS)
Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume
2015-04-01
RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.
SAS- Semantic Annotation Service for Geoscience resources on the web
NASA Astrophysics Data System (ADS)
Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.
2015-12-01
There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
Content Metadata Standards for Marine Science: A Case Study
Riall, Rebecca L.; Marincioni, Fausto; Lightsom, Frances L.
2004-01-01
The U.S. Geological Survey developed a content metadata standard to meet the demands of organizing electronic resources in the marine sciences for a broad, heterogeneous audience. These metadata standards are used by the Marine Realms Information Bank project, a Web-based public distributed library of marine science from academic institutions and government agencies. The development and deployment of this metadata standard serve as a model, complete with lessons about mistakes, for the creation of similarly specialized metadata standards for digital libraries.
NASA Astrophysics Data System (ADS)
Delory, E.; Jirka, S.
2016-02-01
Discovering sensors and observation data is important when enabling the exchange of oceanographic data between observatories and scientists that need the data sets for their work. To better support this discovery process, one task of the European project FixO3 (Fixed-point Open Ocean Observatories) is dealing with the question which elements are needed for developing a better registry for sensors. This has resulted in four items which are addressed by the FixO3 project in cooperation with further European projects such as NeXOS (http://www.nexosproject.eu/). 1.) Metadata description format: To store and retrieve information about sensors and platforms it is necessary to have a common approach how to provide and encode the metadata. For this purpose, the OGC Sensor Model Language (SensorML) 2.0 standard was selected. Especially the opportunity to distinguish between sensor types and instances offers new chances for a more efficient provision and maintenance of sensor metadata. 2.) Conversion of existing metadata into a SensorML 2.0 representation: In order to ensure a sustainable re-use of already provided metadata content (e.g. from ESONET-FixO3 yellow pages), it is important to provide a mechanism which is capable of transforming these already available metadata sets into the new SensorML 2.0 structure. 3.) Metadata editor: To create descriptions of sensors and platforms, it is not possible to expect users to manually edit XML-based description files. Thus, a visual interface is necessary to help during the metadata creation. We will outline a prototype of this editor, building upon the development of the ESONET sensor registry interface. 4.) Sensor Metadata Store: A server is needed that for storing and querying the created sensor descriptions. For this purpose different options exist which will be discussed. In summary, we will present a set of different elements enabling sensor discovery ranging from metadata formats, metadata conversion and editing to metadata storage. Furthermore, the current development status will be demonstrated.
Community-Driven Initiatives to Achieve Interoperability for Ecological and Environmental Data
NASA Astrophysics Data System (ADS)
Madin, J.; Bowers, S.; Jones, M.; Schildhauer, M.
2007-12-01
Advances in ecology and environmental science increasingly depend on information from multiple disciplines to tackle broader and more complex questions about the natural world. Such advances, however, are hindered by data heterogeneity, which impedes the ability of researchers to discover, interpret, and integrate relevant data that have been collected by others. Here, we outline two community-building initiatives for improving data interoperability in the ecological and environmental sciences, one that is well-established (the Ecological Metadata Language [EML]), and another that is actively underway (a unified model for observations and measurements). EML is a metadata specification developed for the ecology discipline, and is based on prior work done by the Ecological Society of America and associated efforts to ensure a modular and extensible framework to document ecological data. EML "modules" are designed to describe one logical part of the total metadata that should be included with any ecological dataset. EML was developed through a series of working meetings, ongoing discussion forums and email lists, with participation from a broad range of ecological and environmental scientists, as well as computer scientists and software developers. Where possible, EML adopted syntax from the other metadata standards for other disciplines (e.g., Dublin Core, Content Standard for Digital Geospatial Metadata, and more). Although EML has not yet been ratified through a standards body, it has become the de facto metadata standard for a large range of ecological data management projects, including for the Long Term Ecological Research Network, the National Center for Ecological Analysis and Synthesis, and the Ecological Society of America. The second community-building initiative is based on work through the Scientific Environment for Ecological Knowledge (SEEK) as well as a recent workshop on multi-disciplinary data management. This initiative aims at improving interoperability by describing the semantics of data at the level of observation and measurement (rather than the traditional focus at the level of the data set) and will define the necessary specifications and technologies to facilitate semantic interpretation and integration of observational data for the environmental sciences. As such, this initiative will focus on unifying the various existing approaches for representing and describing observation data (e.g., SEEK's Observation Ontology, CUAHSI's Observation Data Model, NatureServe's Observation Data Standard, to name a few). Products of this initiative will be compatible with existing standards and build upon recent advances in knowledge representation (e.g., W3C's recommended Web Ontology Language, OWL) that have demonstrated practical utility in enhancing scientific communication and data interoperability in other communities (e.g., the genomics community). A community-sanctioned, extensible, and unified model for observational data will support metadata standards such as EML while reducing the "babel" of scientific dialects that currently impede effective data integration, which will in turn provide a strong foundation for enabling cross-disciplinary synthetic research in the ecological and environmental sciences.
The Role of Metadata Standards in EOSDIS Search and Retrieval Applications
NASA Technical Reports Server (NTRS)
Pfister, Robin
1999-01-01
Metadata standards play a critical role in data search and retrieval systems. Metadata tie software to data so the data can be processed, stored, searched, retrieved and distributed. Without metadata these actions are not possible. The process of populating metadata to describe science data is an important service to the end user community so that a user who is unfamiliar with the data, can easily find and learn about a particular dataset before an order decision is made. Once a good set of standards are in place, the accuracy with which data search can be performed depends on the degree to which metadata standards are adhered during product definition. NASA's Earth Observing System Data and Information System (EOSDIS) provides examples of how metadata standards are used in data search and retrieval.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata
NASA Astrophysics Data System (ADS)
Goodrum, Abby; Howison, James
2005-01-01
This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata
NASA Astrophysics Data System (ADS)
Goodrum, Abby; Howison, James
2004-12-01
This paper considers the deceptively simple question: Why can"t digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
PIMMS tools for capturing metadata about simulations
NASA Astrophysics Data System (ADS)
Pascoe, Charlotte; Devine, Gerard; Tourte, Gregory; Pascoe, Stephen; Lawrence, Bryan; Barjat, Hannah
2013-04-01
PIMMS (Portable Infrastructure for the Metafor Metadata System) provides a method for consistent and comprehensive documentation of modelling activities that enables the sharing of simulation data and model configuration information. The aim of PIMMS is to package the metadata infrastructure developed by Metafor for CMIP5 so that it can be used by climate modelling groups in UK Universities. PIMMS tools capture information about simulations from the design of experiments to the implementation of experiments via simulations that run models. PIMMS uses the Metafor methodology which consists of a Common Information Model (CIM), Controlled Vocabularies (CV) and software tools. PIMMS software tools provide for the creation and consumption of CIM content via a web services infrastructure and portal developed by the ES-DOC community. PIMMS metadata integrates with the ESGF data infrastructure via the mapping of vocabularies onto ESGF facets. There are three paradigms of PIMMS metadata collection: Model Intercomparision Projects (MIPs) where a standard set of questions is asked of all models which perform standard sets of experiments. Disciplinary level metadata collection where a standard set of questions is asked of all models but experiments are specified by users. Bespoke metadata creation where the users define questions about both models and experiments. Examples will be shown of how PIMMS has been configured to suit each of these three paradigms. In each case PIMMS allows users to provide additional metadata beyond that which is asked for in an initial deployment. The primary target for PIMMS is the UK climate modelling community where it is common practice to reuse model configurations from other researchers. This culture of collaboration exists in part because climate models are very complex with many variables that can be modified. Therefore it has become common practice to begin a series of experiments by using another climate model configuration as a starting point. Usually this other configuration is provided by a researcher in the same research group or by a previous collaborator with whom there is an existing scientific relationship. Some efforts have been made at the university department level to create documentation but there is a wide diversity in the scope and purpose of this information. The consistent and comprehensive documentation enabled by PIMMS will enable the wider sharing of climate model data and configuration information. The PIMMS methodology assumes an initial effort to document standard model configurations. Once these descriptions have been created users need only describe the specific way in which their model configuration is different from the standard. Thus the documentation burden on the user is specific to the experiment they are performing and fits easily into the workflow of doing their science. PIMMS metadata is independent of data and as such is ideally suited for documenting model development. PIMMS provides a framework for sharing information about failed model configurations for which data are not kept, the negative results that don't appear in scientific literature. PIMMS is a UK project funded by JISC, The University of Reading, The University of Bristol and STFC.
Assessing Metadata Quality of a Federally Sponsored Health Data Repository.
Marc, David T; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui
2016-01-01
The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn't frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality.
Assessing Metadata Quality of a Federally Sponsored Health Data Repository
Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui
2016-01-01
The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn’t frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality. PMID:28269883
NCPP's Use of Standard Metadata to Promote Open and Transparent Climate Modeling
NASA Astrophysics Data System (ADS)
Treshansky, A.; Barsugli, J. J.; Guentchev, G.; Rood, R. B.; DeLuca, C.
2012-12-01
The National Climate Predictions and Projections (NCPP) Platform is developing comprehensive regional and local information about the evolving climate to inform decision making and adaptation planning. This includes both creating and providing tools to create metadata about the models and processes used to create its derived data products. NCPP is using the Common Information Model (CIM), an ontology developed by a broad set of international partners in climate research, as its metadata language. This use of a standard ensures interoperability within the climate community as well as permitting access to the ecosystem of tools and services emerging alongside the CIM. The CIM itself is divided into a general-purpose (UML & XML) schema which structures metadata documents, and a project or community-specific (XML) Controlled Vocabulary (CV) which constraints the content of metadata documents. NCPP has already modified the CIM Schema to accommodate downscaling models, simulations, and experiments. NCPP is currently developing a CV for use by the downscaling community. Incorporating downscaling into the CIM will lead to several benefits: easy access to the existing CIM Documents describing CMIP5 models and simulations that are being downscaled, access to software tools that have been developed in order to search, manipulate, and visualize CIM metadata, and coordination with national and international efforts such as ES-DOC that are working to make climate model descriptions and datasets interoperable. Providing detailed metadata descriptions which include the full provenance of derived data products will contribute to making that data (and, the models and processes which generated that data) more open and transparent to the user community.
Interoperability Gap Challenges for Learning Object Repositories & Learning Management Systems
ERIC Educational Resources Information Center
Mason, Robert T.
2011-01-01
An interoperability gap exists between Learning Management Systems (LMSs) and Learning Object Repositories (LORs). Learning Objects (LOs) and the associated Learning Object Metadata (LOM) that is stored within LORs adhere to a variety of LOM standards. A common LOM standard found in LORs is the Sharable Content Object Reference Model (SCORM)…
Metadata Authoring with Versatility and Extensibility
NASA Technical Reports Server (NTRS)
Pollack, Janine; Olsen, Lola
2004-01-01
NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.
MPEG-7: standard metadata for multimedia content
NASA Astrophysics Data System (ADS)
Chang, Wo
2005-08-01
The eXtensible Markup Language (XML) metadata technology of describing media contents has emerged as a dominant mode of making media searchable both for human and machine consumptions. To realize this premise, many online Web applications are pushing this concept to its fullest potential. However, a good metadata model does require a robust standardization effort so that the metadata content and its structure can reach its maximum usage between various applications. An effective media content description technology should also use standard metadata structures especially when dealing with various multimedia contents. A new metadata technology called MPEG-7 content description has merged from the ISO MPEG standards body with the charter of defining standard metadata to describe audiovisual content. This paper will give an overview of MPEG-7 technology and what impact it can bring forth to the next generation of multimedia indexing and retrieval applications.
Metadata, Identifiers, and Physical Samples
NASA Astrophysics Data System (ADS)
Arctur, D. K.; Lenhardt, W. C.; Hills, D. J.; Jenkyns, R.; Stroker, K. J.; Todd, N. S.; Dassie, E. P.; Bowring, J. F.
2016-12-01
Physical samples are integral to much of the research conducted by geoscientists. The samples used in this research are often obtained at significant cost and represent an important investment for future research. However, making information about samples - whether considered data or metadata - available for researchers to enable discovery is difficult: a number of key elements related to samples are difficult to characterize in common ways, such as classification, location, sample type, sampling method, repository information, subsample distribution, and instrumentation, because these differ from one domain to the next. Unifying these elements or developing metadata crosswalks is needed. The iSamples (Internet of Samples) NSF-funded Research Coordination Network (RCN) is investigating ways to develop these types of interoperability and crosswalks. Within the iSamples RCN, one of its working groups, WG1, has focused on the metadata related to physical samples. This includes identifying existing metadata standards and systems, and how they might interoperate with the International Geo Sample Number (IGSN) schema (schema.igsn.org) in order to help inform leading practices for metadata. For example, we are examining lifecycle metadata beyond the IGSN `birth certificate.' As a first step, this working group is developing a list of relevant standards and comparing their various attributes. In addition, the working group is looking toward technical solutions to facilitate developing a linked set of registries to build the web of samples. Finally, the group is also developing a comparison of sample identifiers and locators. This paper will provide an overview and comparison of the standards identified thus far, as well as an update on the technical solutions examined for integration. We will discuss how various sample identifiers might work in complementary fashion with the IGSN to more completely describe samples, facilitate retrieval of contextual information, and access research work on related samples. Finally, we welcome suggestions and community input to move physical sample unique identifiers forward.
Migrating the Dawn Data Archive to the PDS4 Standard
NASA Astrophysics Data System (ADS)
Joy, S. P.; Mafi, J. N.; King, T. A.; Raymond, C. A.; Russell, C. T.
2017-12-01
The Dawn mission was proposed prior to the development of the PDS4 standard and all of its data are archived at the PDS Small Bodies Node (SBN) using the older PDS3 standard. Plans to migrate the existing PDS archives to PDS4 have been discussed within PDS for some time, and have been reemphasized in the PDS Roadmap Study for 2017 - 2026 (https://pds.nasa.gov/roadmap/PlanetaryDataSystemRMS17-26_20jun17.pdf). Updating the Dawn metadata to PDS4 would enable users of those data to take advantage of new capabilities offered by PDS4, and insure the full compatibility of past archives with current and future PDS4 tools and services. The Dawn data themselves will not require any reformatting during the migration to PDS4. The data and documentation will need to be reorganized and the metadata enhanced to fill in the gaps in the PDS3 metadata. The planned migration to PDS4 would be primarily carried out at the Dawn Science Center (DSC) at UCLA but the activity will require close coordination with the PDS-SBN. The PDS4 standard allows individual nodes to customize the metadata through the use of optional parameters and local data dictionaries to satisfy discipline and mission specific search and retrieval requirements and support node tools and services. The DSC shares much of its staff with the Planetary Plasma Interactions (PPI) Node of the PDS. This sharing of personnel means that the DSC staff are well versed in the PDS4 standard, have actively participated in the development of this standard, and are fully trained in the use of PPI tools for PDS4 metadata migration and/or generation. The combination of PDS4 training and detailed understanding of the Dawn mission, instruments, and datasets makes the DSC the most cost-effective organization to migrate these data to PDS4.
A standard for measuring metadata quality in spectral libraries
NASA Astrophysics Data System (ADS)
Rasaiah, B.; Jones, S. D.; Bellman, C.
2013-12-01
A standard for measuring metadata quality in spectral libraries Barbara Rasaiah, Simon Jones, Chris Bellman RMIT University Melbourne, Australia barbara.rasaiah@rmit.edu.au, simon.jones@rmit.edu.au, chris.bellman@rmit.edu.au ABSTRACT There is an urgent need within the international remote sensing community to establish a metadata standard for field spectroscopy that ensures high quality, interoperable metadata sets that can be archived and shared efficiently within Earth observation data sharing systems. Metadata are an important component in the cataloguing and analysis of in situ spectroscopy datasets because of their central role in identifying and quantifying the quality and reliability of spectral data and the products derived from them. This paper presents approaches to measuring metadata completeness and quality in spectral libraries to determine reliability, interoperability, and re-useability of a dataset. Explored are quality parameters that meet the unique requirements of in situ spectroscopy datasets, across many campaigns. Examined are the challenges presented by ensuring that data creators, owners, and data users ensure a high level of data integrity throughout the lifecycle of a dataset. Issues such as field measurement methods, instrument calibration, and data representativeness are investigated. The proposed metadata standard incorporates expert recommendations that include metadata protocols critical to all campaigns, and those that are restricted to campaigns for specific target measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. Approaches towards an operational and logistically viable implementation of a quality standard are discussed. This paper also proposes a way forward for adapting and enhancing current geospatial metadata standards to the unique requirements of field spectroscopy metadata quality. [0430] BIOGEOSCIENCES / Computational methods and data processing [0480] BIOGEOSCIENCES / Remote sensing [1904] INFORMATICS / Community standards [1912] INFORMATICS / Data management, preservation, rescue [1926] INFORMATICS / Geospatial [1930] INFORMATICS / Data and information governance [1946] INFORMATICS / Metadata [1952] INFORMATICS / Modeling [1976] INFORMATICS / Software tools and services [9810] GENERAL OR MISCELLANEOUS / New fields
NASA Technical Reports Server (NTRS)
Duggan, Brian
2012-01-01
Downloading and organizing large amounts of files is challenging, and often done using ad hoc methods. This software is capable of downloading and organizing files as an OpenSearch client. It can subscribe to RSS (Really Simple Syndication) feeds and Atom feeds containing arbitrary metadata, and maintains a local content addressable data store. It uses existing standards for obtaining the files, and uses efficient techniques for storing the files. Novel features include symbolic links to maintain a sane directory structure, checksums for validating file integrity during transfer and storage, and flexible use of server-provided metadata.
Descriptive Metadata: Emerging Standards.
ERIC Educational Resources Information Center
Ahronheim, Judith R.
1998-01-01
Discusses metadata, digital resources, cross-disciplinary activity, and standards. Highlights include Standard Generalized Markup Language (SGML); Extensible Markup Language (XML); Dublin Core; Resource Description Framework (RDF); Text Encoding Initiative (TEI); Encoded Archival Description (EAD); art and cultural-heritage metadata initiatives;…
METADATA REGISTRY, ISO/IEC 11179
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R K; Buttler, D J
2008-01-03
ISO/IEC-11179 is an international standard that documents the standardization and registration of metadata to make data understandable and shareable. This standardization and registration allows for easier locating, retrieving, and transmitting data from disparate databases. The standard defines the how metadata are conceptually modeled and how they are shared among parties, but does not define how data is physically represented as bits and bytes. The standard consists of six parts. Part 1 provides a high-level overview of the standard and defines the basic element of a metadata registry - a data element. Part 2 defines the procedures for registering classification schemesmore » and classifying administered items in a metadata registry (MDR). Part 3 specifies the structure of an MDR. Part 4 specifies requirements and recommendations for constructing definitions for data and metadata. Part 5 defines how administered items are named and identified. Part 6 defines how administered items are registered and assigned an identifier.« less
Evaluating and Improving Metadata for Data Use and Understanding
NASA Astrophysics Data System (ADS)
Habermann, T.
2013-12-01
The last several decades have seen an extraordinary increase in the number and breadth of environmental data available to the scientific community and the general public. These increases have focused the environmental data community on creating metadata for discovering data and on the creation and population of catalogs and portals for facilitating discovery. This focus is reflected in the fields required by commonly used metadata standards and has resulted in collections populated with metadata that meet, but don't go far beyond, minimal discovery requirements. Discovery is the first step towards addressing scientific questions using data. As more data are discovered and accessed, users need metadata that 1) automates use and integration of these data in tools and 2) facilitates understanding the data when it is compared to similar datasets or as internal variations are observed. When data discovery is the primary goal, it is important to create records for as many datasets as possible. The content of these records is controlled by minimum requirements, and evaluation is generally limited to testing for required fields and counting records. As the use and understanding needs become more important, more comprehensive evaluation tools are needed. An approach is described for evaluating existing metadata in the light of these new requirements and for improving the metadata to meet them.
Interpreting the ASTM 'content standard for digital geospatial metadata'
Nebert, Douglas D.
1996-01-01
ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.
Progress in defining a standard for file-level metadata
NASA Technical Reports Server (NTRS)
Williams, Joel; Kobler, Ben
1996-01-01
In the following narrative, metadata required to locate a file on tape or collection of tapes will be referred to as file-level metadata. This paper discribes the rationale for and the history of the effort to define a standard for this metadata.
The Planetary Data System Distributed Inventory System
NASA Technical Reports Server (NTRS)
Hughes, J. Steven; McMahon, Susan K.
1996-01-01
The advent of the World Wide Web (Web) and the ability to easily put data repositories on-line has resulted in a proliferation of digital libraries. The heterogeneity of the underlying systems, the autonomy of the individual sites, and distributed nature of the technology has made both interoperability across the sites and the search for resources within a site major research topics. This article will describe a system that addresses both issues using standard Web protocols and meta-data labels to implement an inventory of on-line resources across a group of sites. The success of this system is strongly dependent on the existence of and adherence to a standards architecture that guides the management of meta-data within participating sites.
NASA Astrophysics Data System (ADS)
Maffei, A. R.; Chandler, C. L.; Work, T.; Allen, J.; Groman, R. C.; Fox, P. A.
2009-12-01
Content Management Systems (CMSs) provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have previously designed customized schemas for their metadata. The WHOI Ocean Informatics initiative and the NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) have jointly sponsored a project to port an existing, relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal6 Content Management System. The goal was to translate all the existing database tables, input forms, website reports, and other features present in the existing system to employ Drupal CMS features. The replacement features include Drupal content types, CCK node-reference fields, themes, RDB, SPARQL, workflow, and a number of other supporting modules. Strategic use of some Drupal6 CMS features enables three separate but complementary interfaces that provide access to oceanographic research metadata via the MySQL database: 1) a Drupal6-powered front-end; 2) a standard SQL port (used to provide a Mapserver interface to the metadata and data; and 3) a SPARQL port (feeding a new faceted search capability being developed). Future plans include the creation of science ontologies, by scientist/technologist teams, that will drive semantically-enabled faceted search capabilities planned for the site. Incorporation of semantic technologies included in the future Drupal 7 core release is also anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 6 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO database for interoperability with other ecosystem databases.
NASA Astrophysics Data System (ADS)
Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.
2017-12-01
Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been made compliant, although the process of improving metadata quality is ongoing and iterative. Further research is also warranted into whether or not the gains in metadata quality are also driving gains in data use.
Expanding Access and Usage of NASA Near Real-Time Imagery and Data
NASA Astrophysics Data System (ADS)
Cechini, M.; Murphy, K. J.; Boller, R. A.; Schmaltz, J. E.; Thompson, C. K.; Huang, T.; McGann, J. M.; Ilavajhala, S.; Alarcon, C.; Roberts, J. T.
2013-12-01
In late 2009, the Land Atmosphere Near-real-time Capability for EOS (LANCE) was created to greatly expand the range of near real-time data products from a variety of Earth Observing System (EOS) instruments. Since that time, NASA's Earth Observing System Data and Information System (EOSDIS) developed the Global Imagery Browse Services (GIBS) to provide highly responsive, scalable, and expandable imagery services that distribute near real-time imagery in an intuitive and geo-referenced format. The GIBS imagery services provide access through standards-based protocols such as the Open Geospatial Consortium (OGC) Web Map Tile Service (WMTS) and standard mapping file formats such as the Keyhole Markup Language (KML). Leveraging these standard mechanisms opens NASA near real-time imagery to a broad landscape of mapping libraries supporting mobile applications. By easily integrating with mobile application development libraries, GIBS makes it possible for NASA imagery to become a reliable and valuable source for end-user applications. Recently, EOSDIS has taken steps to integrate near real-time metadata products into the EOS ClearingHOuse (ECHO) metadata repository. Registration of near real-time metadata allows for near real-time data discovery through ECHO clients. In kind with the near real-time data processing requirements, the ECHO ingest model allows for low-latency metadata insertion and updates. Combining with the ECHO repository, the fast visual access of GIBS imagery can now be linked directly back to the source data file(s). Through the use of discovery standards such as OpenSearch, desktop and mobile applications can connect users to more than just an image. As data services, such as OGC Web Coverage Service, become more prevalent within the EOSDIS system, applications may even be able to connect users from imagery to data values. In addition, the full resolution GIBS imagery provides visual context to other GIS data and tools. The NASA near real-time imagery covers a broad set of Earth science disciplines. By leveraging the ECHO and GIBS services, these data can become a visual context within which other GIS activities are performed. The focus of this presentation is to discuss the GIBS imagery and ECHO metadata services facilitating near real-time discovery and usage. Existing synergies and future possibilities will also be discussed. The NASA Worldview demonstration client will be used to show an existing application combining the ECHO and GIBS services.
Lightweight Advertising and Scalable Discovery of Services, Datasets, and Events Using Feedcasts
NASA Astrophysics Data System (ADS)
Wilson, B. D.; Ramachandran, R.; Movva, S.
2010-12-01
Broadcast feeds (Atom or RSS) are a mechanism for advertising the existence of new data objects on the web, with metadata and links to further information. Users then subscribe to the feed to receive updates. This concept has already been used to advertise the new granules of science data as they are produced (datacasting), with browse images and metadata, and to advertise bundles of web services (service casting). Structured metadata is introduced into the XML feed format by embedding new XML tags (in defined namespaces), using typed links, and reusing built-in Atom feed elements. This “infocasting” concept can be extended to include many other science artifacts, including data collections, workflow documents, topical geophysical events (hurricanes, forest fires, etc.), natural hazard warnings, and short articles describing a new science result. The common theme is that each infocast contains machine-readable, structured metadata describing the object and enabling further manipulation. For example, service casts contain type links pointing to the service interface description (e.g., WSDL for SOAP services), service endpoint, and human-readable documentation. Our Infocasting project has three main goals: (1) define and evangelize micro-formats (metadata standards) so that providers can easily advertise their web services, datasets, and topical geophysical events by adding structured information to broadcast feeds; (2) develop authoring tools so that anyone can easily author such service advertisements, data casts, and event descriptions; and (3) provide a one-stop, Google-like search box in the browser that allows discovery of service, data and event casts visible on the web, and services & data registered in the GEOSS repository and other NASA repositories (GCMD & ECHO). To demonstrate the event casting idea, a series of micro-articles—with accompanying event casts containing links to relevant datasets, web services, and science analysis workflows--will be authored for several kinds of geophysical events, such as hurricanes, smoke plume events, tsunamis, etc. The talk will describe our progress so far, and some of the issues with leveraging existing metadata standards to define lightweight micro-formats.
106-17 Telemetry Standards Metadata Configuration Chapter 23
2017-07-01
23-1 23.2 Metadata Description Language ...Chapter 23, July 2017 iii Acronyms HTML Hypertext Markup Language MDL Metadata Description Language PCM pulse code modulation TMATS Telemetry...Attributes Transfer Standard W3C World Wide Web Consortium XML eXtensible Markup Language XSD XML schema document Telemetry Network Standard
The Self-Organized Archive: SPASE, PDS and Archive Cooperatives
NASA Astrophysics Data System (ADS)
King, T. A.; Hughes, J. S.; Roberts, D. A.; Walker, R. J.; Joy, S. P.
2005-05-01
Information systems with high quality metadata enable uses and services which often go beyond the original purpose. There are two types of metadata: annotations which are items that comment on or describe the content of a resource and identification attributes which describe the external properties of the resource itself. For example, annotations may indicate which columns are present in a table of data, whereas an identification attribute would indicate source of the table, such as the observatory, instrument, organization, and data type. When the identification attributes are collected and used as the basis of a search engine, a user can constrain on an attribute, the archive can then self-organize around the constraint, presenting the user with a particular view of the archive. In an archive cooperative where each participating data system or archive may have its own metadata standards, providing a multi-system search engine requires that individual archive metadata be mapped to a broad based standard. To explore how cooperative archives can form a larger self-organized archive we will show how the Space Physics Archive Search and Extract (SPASE) data model will allow different systems to create a cooperative and will use Planetary Data System (PDS) plus existing space physics activities as a demonstration.
Making Metadata Better with CMR and MMT
NASA Technical Reports Server (NTRS)
Gilman, Jason Arthur; Shum, Dana
2016-01-01
Ensuring complete, consistent and high quality metadata is a challenge for metadata providers and curators. The CMR and MMT systems provide providers and curators options to build in metadata quality from the start and also assess and improve the quality of already existing metadata.
NASA Astrophysics Data System (ADS)
Vines, Aleksander; Hansen, Morten W.; Korosov, Anton
2017-04-01
Existing infrastructure international and Norwegian projects, e.g., NorDataNet, NMDC and NORMAP, provide open data access through the OPeNDAP protocol following the conventions for CF (Climate and Forecast) metadata, designed to promote the processing and sharing of files created with the NetCDF application programming interface (API). This approach is now also being implemented in the Norwegian Sentinel Data Hub (satellittdata.no) to provide satellite EO data to the user community. Simultaneously with providing simplified and unified data access, these projects also seek to use and establish common standards for use and discovery metadata. This then allows development of standardized tools for data search and (subset) streaming over the internet to perform actual scientific analysis. A combinnation of software tools, which we call a Scientific Platform as a Service (SPaaS), will take advantage of these opportunities to harmonize and streamline the search, retrieval and analysis of integrated satellite and auxiliary observations of the oceans in a seamless system. The SPaaS is a cloud solution for integration of analysis tools with scientific datasets via an API. The core part of the SPaaS is a distributed metadata catalog to store granular metadata describing the structure, location and content of available satellite, model, and in situ datasets. The analysis tools include software for visualization (also online), interactive in-depth analysis, and server-based processing chains. The API conveys search requests between system nodes (i.e., interactive and server tools) and provides easy access to the metadata catalog, data repositories, and the tools. The SPaaS components are integrated in virtual machines, of which provisioning and deployment are automatized using existing state-of-the-art open-source tools (e.g., Vagrant, Ansible, Docker). The open-source code for scientific tools and virtual machine configurations is under version control at https://github.com/nansencenter/, and is coupled to an online continuous integration system (e.g., Travis CI).
Standardized Metadata for Human Pathogen/Vector Genomic Sequences
Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.
2014-01-01
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976
Standardized metadata for human pathogen/vector genomic sequences.
Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H
2014-01-01
High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
Creating preservation metadata from XML-metadata profiles
NASA Astrophysics Data System (ADS)
Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate
2014-05-01
Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the Metadata Encoding and Transmission Standard (METS). To find datasets in future portals and to make use of this data in own scientific work, proper selection of discovery metadata and application metadata is very important. Some XML-metadata profiles are not suitable for preservation, because version changes are very fast and make it nearly impossible to automate the migration. For other XML-metadata profiles schema definitions are changed after publication of the profile or the schema definitions become inaccessible, which might cause problems during validation of the metadata inside the preservation system [2]. Some metadata profiles are not used widely enough and might not even exist in the future. Eventually, discovery and application metadata have to be embedded into the mdWrap-subtree of the METS-XML. [1] http://www.archivematica.org [2] http://dx.doi.org/10.2218/ijdc.v7i1.215
NASA Technical Reports Server (NTRS)
Shum, Dana; Bugbee, Kaylin
2017-01-01
This talk explains the ongoing metadata curation activities in the Common Metadata Repository. It explores tools that exist today which are useful for building quality metadata and also opens up the floor for discussions on other potentially useful tools.
Patridge, Jeff; Namulanda, Gonza
2008-01-01
The Environmental Public Health Tracking (EPHT) Network provides an opportunity to bring together diverse environmental and health effects data by integrating}?> local, state, and national databases of environmental hazards, environmental exposures, and health effects. To help users locate data on the EPHT Network, the network will utilize descriptive metadata that provide critical information as to the purpose, location, content, and source of these data. Since 2003, the Centers for Disease Control and Prevention's EPHT Metadata Subgroup has been working to initiate the creation and use of descriptive metadata. Efforts undertaken by the group include the adoption of a metadata standard, creation of an EPHT-specific metadata profile, development of an open-source metadata creation tool, and promotion of the creation of descriptive metadata by changing the perception of metadata in the public health culture.
Standardization of Questions in Rare Disease Registries: The PRISM Library Project.
Richesson, Rachel Lynn; Shereff, Denise; Andrews, James Everett
2012-10-10
Patient registries are often a helpful first step in estimating the impact and understanding the etiology of rare diseases - both requisites for the development of new diagnostics and therapeutics. The value and utility of patient registries rely on the use of both well-constructed structured research questions and relevant answer sets accompanying them. There are currently no clear standards or specifications for developing registry questions, and there are no banks of existing questions to support registry developers. This paper introduces the [Rare Disease] PRISM (Patient Registry Item Specifications and Metadata for Rare Disease) project, a library of standardized questions covering a broad spectrum of rare diseases that can be used to support the development of new registries, including Internet-based registries. A convenience sample of questions was identified from well-established (>5 years) natural history studies in various diseases and from several existing registries. Face validity of the questions was determined by review by many experts (both terminology experts at the College of American Pathologists (CAP) and research and informatics experts at the University of South Florida (USF)) for commonality, clarity, and organization. Questions were re-worded slightly, as needed, to make the full semantics of the question clear and to make the questions generalizable to multiple diseases where possible. Questions were indexed with metadata (structured and descriptive information) using a standard metadata framework to record such information as context, format, question asker and responder, and data standards information. At present, PRISM contains over 2,200 questions, with content of PRISM relevant to virtually all rare diseases. While the inclusion of disease-specific questions for thousands of rare disease organizations seeking to develop registries would present a challenge for traditional standards development organizations, the PRISM library could serve as a platform to liaison between rare disease communities and existing standardized controlled terminologies, item banks, and coding systems. If widely used, PRISM will enable the re-use of questions across registries, reduce variation in registry data collection, and facilitate a bottom-up standardization of patient registries. Although it was initially developed to fulfill an urgent need in the rare disease community for shared resources, the PRISM library of patient-directed registry questions can be a valuable resource for registries in any disease - whether common or rare. N/A.
Standardization of Questions in Rare Disease Registries: The PRISM Library Project
Shereff, Denise; Andrews, James Everett
2012-01-01
Background Patient registries are often a helpful first step in estimating the impact and understanding the etiology of rare diseases - both requisites for the development of new diagnostics and therapeutics. The value and utility of patient registries rely on the use of both well-constructed structured research questions and relevant answer sets accompanying them. There are currently no clear standards or specifications for developing registry questions, and there are no banks of existing questions to support registry developers. Objective This paper introduces the [Rare Disease] PRISM (Patient Registry Item Specifications and Metadata for Rare Disease) project, a library of standardized questions covering a broad spectrum of rare diseases that can be used to support the development of new registries, including Internet-based registries. Methods A convenience sample of questions was identified from well-established (>5 years) natural history studies in various diseases and from several existing registries. Face validity of the questions was determined by review by many experts (both terminology experts at the College of American Pathologists (CAP) and research and informatics experts at the University of South Florida (USF)) for commonality, clarity, and organization. Questions were re-worded slightly, as needed, to make the full semantics of the question clear and to make the questions generalizable to multiple diseases where possible. Questions were indexed with metadata (structured and descriptive information) using a standard metadata framework to record such information as context, format, question asker and responder, and data standards information. Results At present, PRISM contains over 2,200 questions, with content of PRISM relevant to virtually all rare diseases. While the inclusion of disease-specific questions for thousands of rare disease organizations seeking to develop registries would present a challenge for traditional standards development organizations, the PRISM library could serve as a platform to liaison between rare disease communities and existing standardized controlled terminologies, item banks, and coding systems. Conclusions If widely used, PRISM will enable the re-use of questions across registries, reduce variation in registry data collection, and facilitate a bottom-up standardization of patient registries. Although it was initially developed to fulfill an urgent need in the rare disease community for shared resources, the PRISM library of patient-directed registry questions can be a valuable resource for registries in any disease – whether common or rare. Trial Registration N/A PMID:23611924
Park, Yu Rang; Yoon, Young Jo; Jang, Tae Hun; Seo, Hwa Jeong; Kim, Ju Han
2014-01-01
Extension of the standard model while retaining compliance with it is a challenging issue because there is currently no method for semantically or syntactically verifying an extended data model. A metadata-based extended model, named CCR+, was designed and implemented to achieve interoperability between standard and extended models. Furthermore, a multilayered validation method was devised to validate the standard and extended models. The American Society for Testing and Materials (ASTM) Community Care Record (CCR) standard was selected to evaluate the CCR+ model; two CCR and one CCR+ XML files were evaluated. In total, 188 metadata were extracted from the ASTM CCR standard; these metadata are semantically interconnected and registered in the metadata registry. An extended-data-model-specific validation file was generated from these metadata. This file can be used in a smartphone application (Health Avatar CCR+) as a part of a multilayered validation. The new CCR+ model was successfully evaluated via a patient-centric exchange scenario involving multiple hospitals, with the results supporting both syntactic and semantic interoperability between the standard CCR and extended, CCR+, model. A feasible method for delivering an extended model that complies with the standard model is presented herein. There is a great need to extend static standard models such as the ASTM CCR in various domains: the methods presented here represent an important reference for achieving interoperability between standard and extended models.
Building Format-Agnostic Metadata Repositories
NASA Astrophysics Data System (ADS)
Cechini, M.; Pilone, D.
2010-12-01
This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on-demand translation of ‘core’ metadata to any supported result format. With this identified approach, the Earth Scientist is provided with a consistent data representation as they interact with a variety of datasets that utilize multiple metadata formats. They are then able to focus their efforts on the more critical research activities which they are undertaking.
Integrated Array/Metadata Analytics
NASA Astrophysics Data System (ADS)
Misev, Dimitar; Baumann, Peter
2015-04-01
Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
Development of an open metadata schema for prospective clinical research (openPCR) in China.
Xu, W; Guan, Z; Sun, J; Wang, Z; Geng, Y
2014-01-01
In China, deployment of electronic data capture (EDC) and clinical data management system (CDMS) for clinical research (CR) is in its very early stage, and about 90% of clinical studies collected and submitted clinical data manually. This work aims to build an open metadata schema for Prospective Clinical Research (openPCR) in China based on openEHR archetypes, in order to help Chinese researchers easily create specific data entry templates for registration, study design and clinical data collection. Singapore Framework for Dublin Core Application Profiles (DCAP) is used to develop openPCR and four steps such as defining the core functional requirements and deducing the core metadata items, developing archetype models, defining metadata terms and creating archetype records, and finally developing implementation syntax are followed. The core functional requirements are divided into three categories: requirements for research registration, requirements for trial design, and requirements for case report form (CRF). 74 metadata items are identified and their Chinese authority names are created. The minimum metadata set of openPCR includes 3 documents, 6 sections, 26 top level data groups, 32 lower data groups and 74 data elements. The top level container in openPCR is composed of public document, internal document and clinical document archetypes. A hierarchical structure of openPCR is established according to Data Structure of Electronic Health Record Architecture and Data Standard of China (Chinese EHR Standard). Metadata attributes are grouped into six parts: identification, definition, representation, relation, usage guides, and administration. OpenPCR is an open metadata schema based on research registration standards, standards of the Clinical Data Interchange Standards Consortium (CDISC) and Chinese healthcare related standards, and is to be publicly available throughout China. It considers future integration of EHR and CR by adopting data structure and data terms in Chinese EHR Standard. Archetypes in openPCR are modularity models and can be separated, recombined, and reused. The authors recommend that the method to develop openPCR can be referenced by other countries when designing metadata schema of clinical research. In the next steps, openPCR should be used in a number of CR projects to test its applicability and to continuously improve its coverage. Besides, metadata schema for research protocol can be developed to structurize and standardize protocol, and syntactical interoperability of openPCR with other related standards can be considered.
Park, Yu Rang; Yoon, Young Jo; Jang, Tae Hun; Seo, Hwa Jeong
2014-01-01
Objectives Extension of the standard model while retaining compliance with it is a challenging issue because there is currently no method for semantically or syntactically verifying an extended data model. A metadata-based extended model, named CCR+, was designed and implemented to achieve interoperability between standard and extended models. Methods Furthermore, a multilayered validation method was devised to validate the standard and extended models. The American Society for Testing and Materials (ASTM) Community Care Record (CCR) standard was selected to evaluate the CCR+ model; two CCR and one CCR+ XML files were evaluated. Results In total, 188 metadata were extracted from the ASTM CCR standard; these metadata are semantically interconnected and registered in the metadata registry. An extended-data-model-specific validation file was generated from these metadata. This file can be used in a smartphone application (Health Avatar CCR+) as a part of a multilayered validation. The new CCR+ model was successfully evaluated via a patient-centric exchange scenario involving multiple hospitals, with the results supporting both syntactic and semantic interoperability between the standard CCR and extended, CCR+, model. Conclusions A feasible method for delivering an extended model that complies with the standard model is presented herein. There is a great need to extend static standard models such as the ASTM CCR in various domains: the methods presented here represent an important reference for achieving interoperability between standard and extended models. PMID:24627817
NASA Astrophysics Data System (ADS)
Prasad, U.; Rahabi, A.
2001-05-01
The following utilities developed for HDF-EOS format data dump are of special use for Earth science data for NASA's Earth Observation System (EOS). This poster demonstrates their use and application. The first four tools take HDF-EOS data files as input. HDF-EOS Metadata Dumper - metadmp Metadata dumper extracts metadata from EOS data granules. It operates by simply copying blocks of metadata from the file to the standard output. It does not process the metadata in any way. Since all metadata in EOS granules is encoded in the Object Description Language (ODL), the output of metadmp will be in the form of complete ODL statements. EOS data granules may contain up to three different sets of metadata (Core, Archive, and Structural Metadata). HDF-EOS Contents Dumper - heosls Heosls dumper displays the contents of HDF-EOS files. This utility provides detailed information on the POINT, SWATH, and GRID data sets. in the files. For example: it will list, the Geo-location fields, Data fields and objects. HDF-EOS ASCII Dumper - asciidmp The ASCII dump utility extracts fields from EOS data granules into plain ASCII text. The output from asciidmp should be easily human readable. With minor editing, asciidmp's output can be made ingestible by any application with ASCII import capabilities. HDF-EOS Binary Dumper - bindmp The binary dumper utility dumps HDF-EOS objects in binary format. This is useful for feeding the output of it into existing program, which does not understand HDF, for example: custom software and COTS products. HDF-EOS User Friendly Metadata - UFM The UFM utility tool is useful for viewing ECS metadata. UFM takes an EOSDIS ODL metadata file and produces an HTML report of the metadata for display using a web browser. HDF-EOS METCHECK - METCHECK METCHECK can be invoked from either Unix or Dos environment with a set of command line options that a user might use to direct the tool inputs and output . METCHECK validates the inventory metadata in (.met file) using The Descriptor file (.desc) as the reference. The tool takes (.desc), and (.met) an ODL file as inputs, and generates a simple output file contains the results of the checking process.
NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat
NASA Astrophysics Data System (ADS)
Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley
2017-04-01
NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse geoscientific data, much of which is being translated from proprietary formats to netCDF at NCI Australia. This data is made available through the NCI National Environmental Research Data Interoperability Platform (NERDIP) for programmatic access and interdisciplinary analysis. The netCDF files contain both scientific data variables (e.g. gravity, magnetic or radiometric values), but also domain-specific operational values (e.g. specific instrument parameters) best described fully in formal vocabularies. Our ncskos codebase provides access to multiple stores of detailed external metadata in a standardised fashion. Geophysical datasets are generated from a "survey" event, and GA maintains corporate databases of all surveys and their associated metadata. It is impractical to replicate the full source survey metadata into each netCDF dataset so, instead, we link the netCDF files to survey metadata using public Linked Data URIs. These URIs link to Survey class objects which we model as a subclass of Activity objects as defined by the PROV Ontology, and we provide URI resolution for them via a custom Linked Data API which draws current survey metadata from GA's in-house databases. We have demonstrated that Linked Data is a practical way to associate netCDF data with detailed, external metadata. This allows us to ensure that catalogued metadata is kept consistent with metadata points-of-truth, and we can infer complex conceptual relationships not possible with netCDF key-value attributes alone.
Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center
NASA Astrophysics Data System (ADS)
Moroni, D. F.; Quach, N.; Francis-Curley, W.
2016-12-01
As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to facilitate rapid and efficient updating of dataset metadata records by external data producers. Here we present this new service and demonstrate the variety of ways in which a multitude of Earth Science datasets may be submitted in a manner that significantly reduces the time in ensuring that new, vital data reaches the public domain.
Mitogenome metadata: current trends and proposed standards.
Strohm, Jeff H T; Gwiazdowski, Rodger A; Hanner, Robert
2016-09-01
Mitogenome metadata are descriptive terms about the sequence, and its specimen description that allow both to be digitally discoverable and interoperable. Here, we review a sampling of mitogenome metadata published in the journal Mitochondrial DNA between 2005 and 2014. Specifically, we have focused on a subset of metadata fields that are available for GenBank records, and specified by the Genomics Standards Consortium (GSC) and other biodiversity metadata standards; and we assessed their presence across three main categories: collection, biological and taxonomic information. To do this we reviewed 146 mitogenome manuscripts, and their associated GenBank records, and scored them for 13 metadata fields. We also explored the potential for mitogenome misidentification using their sequence diversity, and taxonomic metadata on the Barcode of Life Datasystems (BOLD). For this, we focused on all Lepidoptera and Perciformes mitogenomes included in the review, along with additional mitogenome sequence data mined from Genbank. Overall, we found that none of 146 mitogenome projects provided all the metadata we looked for; and only 17 projects provided at least one category of metadata across the three main categories. Comparisons using mtDNA sequences from BOLD, suggest that some mitogenomes may be misidentified. Lastly, we appreciate the research potential of mitogenomes announced through this journal; and we conclude with a suggestion of 13 metadata fields, available on GenBank, that if provided in a mitogenomes's GenBank record, would increase their research value.
Metadata squared: enhancing its usability for volunteered geographic information and the GeoWeb
Poore, Barbara S.; Wolf, Eric B.; Sui, Daniel Z.; Elwood, Sarah; Goodchild, Michael F.
2013-01-01
The Internet has brought many changes to the way geographic information is created and shared. One aspect that has not changed is metadata. Static spatial data quality descriptions were standardized in the mid-1990s and cannot accommodate the current climate of data creation where nonexperts are using mobile phones and other location-based devices on a continuous basis to contribute data to Internet mapping platforms. The usability of standard geospatial metadata is being questioned by academics and neogeographers alike. This chapter analyzes current discussions of metadata to demonstrate how the media shift that is occurring has affected requirements for metadata. Two case studies of metadata use are presented—online sharing of environmental information through a regional spatial data infrastructure in the early 2000s, and new types of metadata that are being used today in OpenStreetMap, a map of the world created entirely by volunteers. Changes in metadata requirements are examined for usability, the ease with which metadata supports coproduction of data by communities of users, how metadata enhances findability, and how the relationship between metadata and data has changed. We argue that traditional metadata associated with spatial data infrastructures is inadequate and suggest several research avenues to make this type of metadata more interactive and effective in the GeoWeb.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Inigo, Gil San; Servilla, Mark; Brunt, James
2008-06-01
The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata.more » EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer's tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER's experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.« less
Gil, Inigo San; Sheldon, Wade; Schmidt, Tom; Servilla, Mark; Aguilar, Raul; Gries, Corinna; Gray, Tanya; Field, Dawn; Cole, James; Pan, Jerry Yun; Palanisamy, Giri; Henshaw, Donald; O'Brien, Margaret; Kinkel, Linda; McMahon, Katherine; Kottmann, Renzo; Amaral-Zettler, Linda; Hobbie, John; Goldstein, Philip; Guralnick, Robert P; Brunt, James; Michener, William K
2008-06-01
The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata. EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer's tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER's experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.
Creating FGDC and NBII metadata with Metavist 2005.
David J. Rugg
2004-01-01
This report documents a computer program for creating metadata compliant with the Federal Geographic Data Committee (FGDC) 1998 metadata standard or the National Biological Information Infrastructure (NBII) 1999 Biological Data Profile for the FGDC standard. The software runs under the Microsoft Windows 2000 and XP operating systems, and requires the presence of...
Metadata for WIS and WIGOS: GAW Profile of ISO19115 and Draft WIGOS Core Metadata Standard
NASA Astrophysics Data System (ADS)
Klausen, Jörg; Howe, Brian
2014-05-01
The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority to underpin all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). The development of the WIGOS Operational Information Resource (WIR) is central to the WIGOS Framework Implementation Plan (WIGOS-IP). The WIR shall provide information on WIGOS and its observing components, as well as requirements of WMO application areas. An important aspect is the description of the observational capabilities by way of structured metadata. The Global Atmosphere Watch is the WMO program addressing the chemical composition and selected physical properties of the atmosphere. Observational data are collected and archived by GAW World Data Centres (WDCs) and related data centres. The Task Team on GAW WDCs (ET-WDC) have developed a profile of the ISO19115 metadata standard that is compliant with the WMO Information System (WIS) specification for the WMO Core Metadata Profile v1.3. This profile is intended to harmonize certain aspects of the documentation of observations as well as the interoperability of the WDCs. The Inter-Commission-Group on WIGOS (ICG-WIGOS) has established the Task Team on WIGOS Metadata (TT-WMD) with representation of all WMO Technical Commissions and the objective to define the WIGOS Core Metadata. The result of this effort is a draft semantic standard comprising of a set of metadata classes that are considered to be of critical importance for the interpretation of observations relevant to WIGOS. The purpose of the presentation is to acquaint the audience with the standard and to solicit informal feed-back from experts in the various disciplines of meteorology and climatology. This feed-back will help ET-WDC and TT-WMD to refine the GAW metadata profile and the draft WIGOS metadata standard, thereby increasing their utility and acceptance.
Karst database development in Minnesota: Design and data assembly
Gao, Y.; Alexander, E.C.; Tipping, R.G.
2005-01-01
The Karst Feature Database (KFD) of Minnesota is a relational GIS-based Database Management System (DBMS). Previous karst feature datasets used inconsistent attributes to describe karst features in different areas of Minnesota. Existing metadata were modified and standardized to represent a comprehensive metadata for all the karst features in Minnesota. Microsoft Access 2000 and ArcView 3.2 were used to develop this working database. Existing county and sub-county karst feature datasets have been assembled into the KFD, which is capable of visualizing and analyzing the entire data set. By November 17 2002, 11,682 karst features were stored in the KFD of Minnesota. Data tables are stored in a Microsoft Access 2000 DBMS and linked to corresponding ArcView applications. The current KFD of Minnesota has been moved from a Windows NT server to a Windows 2000 Citrix server accessible to researchers and planners through networked interfaces. ?? Springer-Verlag 2005.
Long-term Science Data Curation Using a Digital Object Model and Open-Source Frameworks
NASA Astrophysics Data System (ADS)
Pan, J.; Lenhardt, W.; Wilson, B. E.; Palanisamy, G.; Cook, R. B.
2010-12-01
Scientific digital content, including Earth Science observations and model output, has become more heterogeneous in format and more distributed across the Internet. In addition, data and metadata are becoming necessarily linked internally and externally on the Web. As a result, such content has become more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content consistently. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all. We investigate the use of several open-source software frameworks to address these challenges. We use Fedora Commons Framework and its digital object abstraction as the repository, Drupal CMS as the user-interface, and the Islandora module as the connector from Drupal to Fedora Repository. With the digital object model, metadata of data description and data provenance can be associated with data content in a formal manner, so are external references and other arbitrary auxiliary information. Changes are formally audited on an object, and digital contents are versioned and have checksums automatically computed. Further, relationships among objects are formally expressed with RDF triples. Data replication, recovery, metadata export are supported with standard protocols, such as OAI-PMH. We provide a tentative comparative analysis of the chosen software stack with the Open Archival Information System (OAIS) reference model, along with our initial results with the existing terrestrial ecology data collections at NASA’s ORNL Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC).
NASA Astrophysics Data System (ADS)
Galbraith, N. R.; Graybeal, J.; Bermudez, L. E.; Wright, D.
2005-12-01
The Marine Metadata Interoperability (MMI) initiative promotes the exchange, integration and use of marine data through enhanced data publishing, discovery, documentation and accessibility. The project, operating since late 2004, presents several cultural organizational challenges because of the diversity of participants: scientists, technical experts, and data managers from around the world, all working in organizations with different corporate cultures, funding structures, and systems of decision-making. MMI provides educational resources at several levels. For instance, short introductions to metadata concepts are available, as well as guides and "cookbooks" for the quick and efficient preparation of marine metadata. For those who are building major marine data systems, including ocean-observing capabilities, there are training materials, marine metadata content examples, and resources for mapping elements between different metadata standards. The MMI also provides examples of good metadata practices in existing data systems, including the EU's Marine XML project, and functioning ocean/coastal clearinghouses and atlases developed by MMI team members. Communication tools that help build community: 1) Website, used to introduce the initiative to new visitors, and to provide in-depth guidance and resources to members and visitors. The site is built using Plone, an open source web content management system. Plone allows the site to serve as a wiki, to which every user can contribute material. This keeps the membership engaged and spreads the responsibility for the tasks of updating and expanding the site. 2) Email-lists, to engage the broad ocean sciences community. The discussion forums "news," "ask," and "site-help" are available for receiving regular updates on MMI activities, seeking advice or support on projects and standards, or for assistance with using the MMI site. Internal email lists are provided for the Technical Team, the Steering Committee and Executive Committee, and for several content-centered teams. These lists help keep committee members connected, and have been very successful in building consensus and momentum. 3) Regularly scheduled telecons, to provide the chance for interaction between members without the need to physically attend meetings. Both the steering committee and the technical team convene via phone every month. Discussions are guided by agendas published in advance, and minutes are kept on-line for reference. These telecons have been an important tool in moving the MMI project forward; they give members an opportunity for informal discussion and provide a timeframe for accomplishing tasks. 4) Workshops, to make progress towards community agreement, such as the technical workshop "Advancing Domain Vocabularies" August 9-11, 2005, in Boulder, Colorado, where featured domain and metadata experts developed mappings between existing marine metadata vocabularies. Most of the work of the meeting was performed in six small, carefully organized breakout teams, oriented around specific domains. 5) Calendar of events, to keep update the users and where any event related to marine metadata and interoperability can be posted. 6) Specific tools to reach agreements among distributed communities. For example, we developed a tool called Vocabulary Integration Environment (VINE), that allows formalized agreements of mappings across different vocabularies.
Evolving Metadata in NASA Earth Science Data Systems
NASA Astrophysics Data System (ADS)
Mitchell, A.; Cechini, M. F.; Walter, J.
2011-12-01
NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of NASA's effort to continually evolve its data systems led ECHO to enhancing the method in which it receives inventory metadata from the data centers to allow for multiple metadata formats including ISO 19115. ECHO's metadata model will also be mapped to the NASA-specific convention for ingesting science metadata into the ECHO system. As NASA's new Earth Science missions and data centers are migrating to the ISO 19115 standards, EOSDIS is developing metadata management resources to assist in the reading, writing and parsing ISO 19115 compliant metadata. To foster interoperability with other agencies and international partners, NASA is working to ensure that a common ISO 19115 convention is developed, enhancing data sharing capabilities and other data analysis initiatives. NASA is also investigating the use of ISO 19115 standards to encode data quality, lineage and provenance with stored values. A common metadata standard across NASA's Earth Science data systems promotes interoperability, enhances data utilization and removes levels of uncertainty found in data products.
Achieving interoperability for metadata registries using comparative object modeling.
Park, Yu Rang; Kim, Ju Han
2010-01-01
Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.
The Benefits and Future of Standards: Metadata and Beyond
NASA Astrophysics Data System (ADS)
Stracke, Christian M.
This article discusses the benefits and future of standards and presents the generic multi-dimensional Reference Model. First the importance and the tasks of interoperability as well as quality development and their relationship are analyzed. Especially in e-Learning their connection and interdependence is evident: Interoperability is one basic requirement for quality development. In this paper, it is shown how standards and specifications are supporting these crucial issues. The upcoming ISO metadata standard MLR (Metadata for Learning Resource) will be introduced and used as example for identifying the requirements and needs for future standardization. In conclusion a vision of the challenges and potentials for e-Learning standardization is outlined.
A case for user-generated sensor metadata
NASA Astrophysics Data System (ADS)
Nüst, Daniel
2015-04-01
Cheap and easy to use sensing technology and new developments in ICT towards a global network of sensors and actuators promise previously unthought of changes for our understanding of the environment. Large professional as well as amateur sensor networks exist, and they are used for specific yet diverse applications across domains such as hydrology, meteorology or early warning systems. However the impact this "abundance of sensors" had so far is somewhat disappointing. There is a gap between (community-driven) sensor networks that could provide very useful data and the users of the data. In our presentation, we argue this is due to a lack of metadata which allows determining the fitness of use of a dataset. Syntactic or semantic interoperability for sensor webs have made great progress and continue to be an active field of research, yet they often are quite complex, which is of course due to the complexity of the problem at hand. But still, we see the most generic information to determine fitness for use is a dataset's provenance, because it allows users to make up their own minds independently from existing classification schemes for data quality. In this work we will make the case how curated user-contributed metadata has the potential to improve this situation. This especially applies for scenarios in which an observed property is applicable in different domains, and for set-ups where the understanding about metadata concepts and (meta-)data quality differs between data provider and user. On the one hand a citizen does not understand the ISO provenance metadata. On the other hand a researcher might find issues in publicly accessible time series published by citizens, which the latter might not be aware of or care about. Because users will have to determine fitness for use for each application on their own anyway, we suggest an online collaboration platform for user-generated metadata based on an extremely simplified data model. In the most basic fashion, metadata generated by users can be boiled down to a basic property of the world wide web: many information items, such as news or blog posts, allow users to create comments and rate the content. Therefore we argue to focus a core data model on one text field for a textual comment, one optional numerical field for a rating, and a resolvable identifier for the dataset that is commented on. We present a conceptual framework that integrates user comments in existing standards and relevant applications of online sensor networks and discuss possible approaches, such as linked data, brokering, or standalone metadata portals. We relate this framework to existing work in user generated content, such as proprietary rating systems on commercial websites, microformats, the GeoViQua User Quality Model, the CHARMe annotations, or W3C Open Annotation. These systems are also explored for commonalities and based on their very useful concepts and ideas; we present an outline for future extensions of the minimal model. Building on this framework we present a concept how a simplistic comment-rating-system can be extended to capture provenance information for spatio-temporal observations in the sensor web, and how this framework can be evaluated.
Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine
2018-04-12
The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses some of the major data and metadata management challenges that exist across the diverse missions of the bureaus and offices. All employees who create, modify, or use data are involved with data and metadata management. Identifying, establishing, and formalizing the roles and responsibilities associated with metadata management are key to institutionalizing a framework of best practices, methodologies, processes, and common approaches throughout all levels of the organization; these are the foundation for effective data resource management. For executives and managers, metadata management strengthens their overarching views of data assets, holdings, and data interoperability; and clarifies how metadata management can help accelerate the compliance of multiple policy mandates. For employees, data stewards, and data professionals, formalized metadata management will help with the consistency of definitions, and approaches addressing data discoverability, data quality, and data lineage. In addition to data professionals and others associated with information technology; data stewards and program subject matter experts take on important metadata management roles and responsibilities as data flow through their respective business and science-related workflows. The responsibilities of establishing, practicing, and governing the actions associated with their specific metadata management roles are critical to successful metadata implementation.
NASA Astrophysics Data System (ADS)
Agarwal, D.; Varadharajan, C.; Cholia, S.; Snavely, C.; Hendrix, V.; Gunter, D.; Riley, W. J.; Jones, M.; Budden, A. E.; Vieglais, D.
2017-12-01
The ESS-DIVE archive is a new U.S. Department of Energy (DOE) data archive designed to provide long-term stewardship and use of data from observational, experimental, and modeling activities in the earth and environmental sciences. The ESS-DIVE infrastructure is constructed with the long-term vision of enabling broad access to and usage of the DOE sponsored data stored in the archive. It is designed as a scalable framework that incentivizes data providers to contribute well-structured, high-quality data to the archive and that enables the user community to easily build data processing, synthesis, and analysis capabilities using those data. The key innovations in our design include: (1) application of user-experience research methods to understand the needs of users and data contributors; (2) support for early data archiving during project data QA/QC and before public release; (3) focus on implementation of data standards in collaboration with the community; (4) support for community built tools for data search, interpretation, analysis, and visualization tools; (5) data fusion database to support search of the data extracted from packages submitted and data available in partner data systems such as the Earth System Grid Federation (ESGF) and DataONE; and (6) support for archiving of data packages that are not to be released to the public. ESS-DIVE data contributors will be able to archive and version their data and metadata, obtain data DOIs, search for and access ESS data and metadata via web and programmatic portals, and provide data and metadata in standardized forms. The ESS-DIVE archive and catalog will be federated with other existing catalogs, allowing cross-catalog metadata search and data exchange with existing systems, including DataONE's Metacat search. ESS-DIVE is operated by a multidisciplinary team from Berkeley Lab, the National Center for Ecological Analysis and Synthesis (NCEAS), and DataONE. The primarily data copies are hosted at DOE's NERSC supercomputing facility with replicas at DataONE nodes.
Harvesting NASA's Common Metadata Repository (CMR)
NASA Technical Reports Server (NTRS)
Shum, Dana; Durbin, Chris; Norton, James; Mitchell, Andrew
2017-01-01
As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Harvesting NASA's Common Metadata Repository
NASA Astrophysics Data System (ADS)
Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.
2017-12-01
As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
EPA Metadata Style Guide Keywords and EPA Organization Names
The following keywords and EPA organization names listed below, along with EPA’s Metadata Style Guide, are intended to provide suggestions and guidance to assist with the standardization of metadata records.
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness.
Liolios, Konstantinos; Schriml, Lynn; Hirschman, Lynette; Pagani, Ioanna; Nosrat, Bahador; Sterk, Peter; White, Owen; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Taylor, Chris; Kyrpides, Nikos C; Field, Dawn
2012-07-30
Variability in the extent of the descriptions of data ('metadata') held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
Inigo San Gil; Wade Sheldon; Tom Schmidt; Mark Servilla; Raul Aguilar; Corinna Gries; Tanya Gray; Dawn Field; James Cole; Jerry Yun Pan; Giri Palanisamy; Donald Henshaw; Margaret O' Brien; Linda Kinkel; Kathrine McMahon; Renzo Kottmann; Linda Amaral-Zettler; John Hobbie; Philip Goldstein; Robert P. Guralnick; James Brunt; William K. Michener
2008-01-01
The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML)....
UAV field demonstration of social media enabled tactical data link
NASA Astrophysics Data System (ADS)
Olson, Christopher C.; Xu, Da; Martin, Sean R.; Castelli, Jonathan C.; Newman, Andrew J.
2015-05-01
This paper addresses the problem of enabling Command and Control (C2) and data exfiltration functions for missions using small, unmanned, airborne surveillance and reconnaissance platforms. The authors demonstrated the feasibility of using existing commercial wireless networks as the data transmission infrastructure to support Unmanned Aerial Vehicle (UAV) autonomy functions such as transmission of commands, imagery, metadata, and multi-vehicle coordination messages. The authors developed and integrated a C2 Android application for ground users with a common smart phone, a C2 and data exfiltration Android application deployed on-board the UAVs, and a web server with database to disseminate the collected data to distributed users using standard web browsers. The authors performed a mission-relevant field test and demonstration in which operators commanded a UAV from an Android device to search and loiter; and remote users viewed imagery, video, and metadata via web server to identify and track a vehicle on the ground. Social media served as the tactical data link for all command messages, images, videos, and metadata during the field demonstration. Imagery, video, and metadata were transmitted from the UAV to the web server via multiple Twitter, Flickr, Facebook, YouTube, and similar media accounts. The web server reassembled images and video with corresponding metadata for distributed users. The UAV autopilot communicated with the on-board Android device via on-board Bluetooth network.
NASA Astrophysics Data System (ADS)
Christianson, D. S.; Varadharajan, C.; Detto, M.; Faybishenko, B.; Gimenez, B.; Jardine, K.; Negron Juarez, R. I.; Pastorello, G.; Powell, T.; Warren, J.; Wolfe, B.; McDowell, N. G.; Kueppers, L. M.; Chambers, J.; Agarwal, D.
2016-12-01
The U.S. Department of Energy's (DOE) Next Generation Ecosystem Experiment (NGEE) Tropics project aims to develop a process-rich tropical forest ecosystem model that is parameterized and benchmarked by field observations. Thus, data synthesis, quality assurance and quality control (QA/QC), and data product generation of a diverse and complex set of ecohydrological observations, including sapflux, leaf surface temperature, soil water content, and leaf gas exchange from sites across the Tropics, are required to support model simulations. We have developed a metadata reporting framework, implemented in conjunction with the NGEE Tropics Data Archive tool, to enable cross-site and cross-method comparison, data interpretability, and QA/QC. We employed a modified User-Centered Design approach, which involved short development cycles based on user-identified needs, and iterative testing with data providers and users. The metadata reporting framework currently has been implemented for sensor-based observations and leverages several existing metadata protocols. The framework consists of templates that define a multi-scale measurement position hierarchy, descriptions of measurement settings, and details about data collection and data file organization. The framework also enables data providers to define data-access permission settings, provenance, and referencing to enable appropriate data usage, citation, and attribution. In addition to describing the metadata reporting framework, we discuss tradeoffs and impressions from both data providers and users during the development process, focusing on the scalability, usability, and efficiency of the framework.
Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process
ERIC Educational Resources Information Center
Currier, Sarah; Barton, Jane; O'Beirne, Ronan; Ryan, Ben
2004-01-01
Metadata enables users to find the resources they require, therefore it is an important component of any digital learning object repository. Much work has already been done within the learning technology community to assure metadata quality, focused on the development of metadata standards, specifications and vocabularies and their implementation…
Enhancing SCORM Metadata for Assessment Authoring in E-Learning
ERIC Educational Resources Information Center
Chang, Wen-Chih; Hsu, Hui-Huang; Smith, Timothy K.; Wang, Chun-Chia
2004-01-01
With the rapid development of distance learning and the XML technology, metadata play an important role in e-Learning. Nowadays, many distance learning standards, such as SCORM, AICC CMI, IEEE LTSC LOM and IMS, use metadata to tag learning materials. However, most metadata models are used to define learning materials and test problems. Few…
An Approach to Information Management for AIR7000 with Metadata and Ontologies
2009-10-01
metadata. We then propose an approach based on Semantic Technologies including the Resource Description Framework (RDF) and Upper Ontologies, for the...mandating specific metadata schemas can result in interoperability problems. For example, many standards within the ADO mandate the use of XML for metadata...such problems, we propose an archi- tecture in which different metadata schemes can inter operate. By using RDF (Resource Description Framework ) as a
NASA Astrophysics Data System (ADS)
Troyan, D.
2016-12-01
The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
NASA Technical Reports Server (NTRS)
Smit, Christine; Hegde, Mahabaleshwara; Strub, Richard; Bryant, Keith; Li, Angela; Petrenko, Maksym
2017-01-01
Giovanni is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization.
Park, Yu Rang; Kim*, Ju Han
2006-01-01
Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675
NASA Astrophysics Data System (ADS)
Schweitzer, R. H.
2001-05-01
The Climate Diagnostics Center maintains a collection of gridded climate data primarily for use by local researchers. Because this data is available on fast digital storage and because it has been converted to netCDF using a standard metadata convention (called COARDS), we recognize that this data collection is also useful to the community at large. At CDC we try to use technology and metadata standards to reduce our costs associated with making these data available to the public. The World Wide Web has been an excellent technology platform for meeting that goal. Specifically we have developed Web-based user interfaces that allow users to search, plot and download subsets from the data collection. We have also been exploring use of the Pacific Marine Environment Laboratory's Live Access Server (LAS) as an engine for this task. This would result in further savings by allowing us to concentrate on customizing the LAS where needed, rather that developing and maintaining our own system. One such customization currently under development is the use of Java Servlets and JavaServer pages in conjunction with a metadata database to produce a hierarchical user interface to LAS. In addition to these Web-based user interfaces all of our data are available via the Distributed Oceanographic Data System (DODS). This allows other sites using LAS and individuals using DODS-enabled clients to use our data as if it were a local file. All of these technology systems are driven by metadata. When we began to create netCDF files, we collaborated with several other agencies to develop a netCDF convention (COARDS) for metadata. At CDC we have extended that convention to incorporate additional metadata elements to make the netCDF files as self-describing as possible. Part of the local metadata is a set of controlled names for the variable, level in the atmosphere and ocean, statistic and data set for each netCDF file. To allow searching and easy reorganization of these metadata, we loaded the metadata from the netCDF files into a mySQL database. The combination of the mySQL database and the controlled names makes it possible to automate the construction of user interfaces and standard format metadata descriptions, like Federal Geographic Data Committee (FGDC) and Directory Interchange Format (DIF). These standard descriptions also include an association between our controlled names and standard keywords such as those developed by the Global Change Master Directory (GCMD). This talk will give an overview of each of these technology and metadata standards as it applies to work at the Climate Diagnostics Center. The talk will also discuss the pros and cons of each approach and discuss areas for future development.
NASA Reverb: Standards-Driven Earth Science Data and Service Discovery
NASA Astrophysics Data System (ADS)
Cechini, M. F.; Mitchell, A.; Pilone, D.
2011-12-01
NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard. After a yearlong design, development, and testing process, the ECHO team successfully released "Reverb - The Next Generation Earth Science Discovery Tool." Reverb relies heavily on the information contained in dataset and granule metadata, such as ISO 19115, to provide a dynamic experience to users based on identified search facet values extracted from science metadata. Such an approach allows users to perform cross-dataset correlation and searches, discovering additional data that they may not previously have been aware of. In addition to data discovery, Reverb users may discover services associated with their data of interest. When services utilize supported standards and/or protocols, Reverb can facilitate the invocation of both synchronous and asynchronous data processing services. This greatly enhances a users ability to discover data of interest and accomplish their research goals. Extrapolating on the current movement towards interoperable standards and an increase in available services, data service invocation and chaining will become a natural part of data discovery. Reverb is one example of a discovery tool that provides a mechanism for transforming the earth science data discovery paradigm.
NASA Astrophysics Data System (ADS)
Wang, Jingbo; Bastrakova, Irina; Evans, Ben; Gohar, Kashif; Santana, Fabiana; Wyborn, Lesley
2015-04-01
National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. We manage 40+ data collections using NCI's Data Management Plan (DMP), which is compatible with the ISO 19100 metadata standards. We utilize ISO standards to make sure our metadata is transferable and interoperable for sharing and harvesting. The DMP is used along with metadata from the data itself, to create a hierarchy of data collection, dataset and time series catalogues that is then exposed through GeoNetwork for standard discoverability. This hierarchy catalogues are linked using a parent-child relationship. The hierarchical infrastructure of our GeoNetwork catalogues system aims to address both discoverability and in-house administrative use-cases. At NCI, we are currently improving the metadata interoperability in our catalogue by linking with standardized community vocabulary services. These emerging vocabulary services are being established to help harmonise data from different national and international scientific communities. One such vocabulary service is currently being established by the Australian National Data Services (ANDS). Data citation is another important aspect of the NCI data infrastructure, which allows tracking of data usage and infrastructure investment, encourage data sharing, and increasing trust in research that is reliant on these data collections. We incorporate the standard vocabularies into the data citation metadata so that the data citation become machine readable and semantically friendly for web-search purpose as well. By standardizing our metadata structure across our entire data corpus, we are laying the foundation to enable the application of appropriate semantic mechanisms to enhance discovery and analysis of NCI's national environmental research data information. We expect that this will further increase the data discoverability and encourage the data sharing and reuse within the community, increasing the value of the data much further than its current use.
Tools for proactive collection and use of quality metadata in GEOSS
NASA Astrophysics Data System (ADS)
Bastin, L.; Thum, S.; Maso, J.; Yang, K. X.; Nüst, D.; Van den Broek, M.; Lush, V.; Papeschi, F.; Riverola, A.
2012-12-01
The GEOSS Common Infrastructure allows interactive evaluation and selection of Earth Observation datasets by the scientific community and decision makers, but the data quality information needed to assess fitness for use is often patchy and hard to visualise when comparing candidate datasets. In a number of studies over the past decade, users repeatedly identified the same types of gaps in quality metadata, specifying the need for enhancements such as peer and expert review, better traceability and provenance information, information on citations and usage of a dataset, warning about problems identified with a dataset and potential workarounds, and 'soft knowledge' from data producers (e.g. recommendations for use which are not easily encoded using the existing standards). Despite clear identification of these issues in a number of recommendations, the gaps persist in practice and are highlighted once more in our own, more recent, surveys. This continuing deficit may well be the result of a historic paucity of tools to support the easy documentation and continual review of dataset quality. However, more recent developments in tools and standards, as well as more general technological advances, present the opportunity for a community of scientific users to adopt a more proactive attitude by commenting on their uses of data, and for that feedback to be federated with more traditional and static forms of metadata, allowing a user to more accurately assess the suitability of a dataset for their own specific context and reliability thresholds. The EU FP7 GeoViQua project aims to develop this opportunity by adding data quality representations to the existing search and visualisation functionalities of the Geo Portal. Subsequently we will help to close the gap by providing tools to easily create quality information, and to permit user-friendly exploration of that information as the ultimate incentive for improved data quality documentation. Quality information is derived from producer metadata, from the data themselves, from validation of in-situ sensor data, from provenance information and from user feedback, and will be aggregated to produce clear and useful summaries of quality, including a GEO Label. GeoViQua's conceptual quality information models for users and producers are specifically described and illustrated in this presentation. These models (which have been encoded as XML schemas and can be accessed at http://schemas.geoviqua.org/) are designed to satisfy the identified user needs while remaining consistent with current standards such as ISO 19115 and advanced drafts such as ISO 19157. The resulting components being developed for the GEO Portal are designed to lower the entry barrier to users who wish to help to generate and explore rich and useful metadata. This metadata will include reviews, comments and ratings, reports of usage in specific domains and specification of datasets used for benchmarking, as well as rich quantitative information encoded in more traditional data quality elements such as thematic correctness and positional accuracy. The value of the enriched metadata will also be enhanced by graphical tools for visualizing spatially distributed uncertainties. We demonstrate practical example applications in selected environmental application domains.
Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.
2010-12-01
While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and the lessons learned. References: [1] R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. [2] R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010). [3] Devarakonda, R.; Palanisamy, G.; Green, J.; Wilson, B. E. "Mercury: An Example of Effective Software Reuse for Metadata Management Data Discovery and Access", Eos Trans. AGU, 89(53), Fall Meet. Suppl., IN11A-1019 (2008).
Collaborative Metadata Curation in Support of NASA Earth Science Data Stewardship
NASA Technical Reports Server (NTRS)
Sisco, Adam W.; Bugbee, Kaylin; le Roux, Jeanne; Staton, Patrick; Freitag, Brian; Dixon, Valerie
2018-01-01
Growing collection of NASA Earth science data is archived and distributed by EOSDIS’s 12 Distributed Active Archive Centers (DAACs). Each collection and granule is described by a metadata record housed in the Common Metadata Repository (CMR). Multiple metadata standards are in use, and core elements of each are mapped to and from a common model – the Unified Metadata Model (UMM). Work done by the Analysis and Review of CMR (ARC) Team.
A Metadata Element Set for Project Documentation
NASA Technical Reports Server (NTRS)
Hodge, Gail; Templeton, Clay; Allen, Robert B.
2003-01-01
Abstract NASA Goddard Space Flight Center is a large engineering enterprise with many projects. We describe our efforts to develop standard metadata sets across project documentation which we term the "Goddard Core". We also address broader issues for project management metadata.
A Grid Metadata Service for Earth and Environmental Sciences
NASA Astrophysics Data System (ADS)
Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni
2010-05-01
Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which increases flexibility and scalability; (iii) provides full support for Grid Security Infrastructure, which means (authorization, mutual authentication, data integrity, data confidentiality and delegation); (iv) is compatible with existing grid middleware such as gLite and Globus and finally (v) is currently adopted at the Euro-Mediterranean Centre for Climate Change (CMCC - Italy) to manage the entire CMCC data production activity as well as in the international Climate-G testbed.
Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.
2014-01-01
Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).
Effective use of metadata in the integration and analysis of multi-dimensional optical data
NASA Astrophysics Data System (ADS)
Pastorello, G. Z.; Gamon, J. A.
2012-12-01
Data discovery and integration relies on adequate metadata. However, creating and maintaining metadata is time consuming and often poorly addressed or avoided altogether, leading to problems in later data analysis and exchange. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Vegetation monitoring using in-situ and remote optical sensing is an example of such a domain. In this area, data are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized. Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations, field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle), data processing choices (e.g., methods for gap filling, filtering and aggregation of data), quality assurance, and documentation of data sources, ownership and licensing. Each of these aspects can be important as metadata for search and discovery, but they can also be used as key data fields in their own right. If each of these aspects is also understood as an "extra dimension," it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include selecting data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., adaptive processing for data collected under different sky conditions). More interesting scenarios involve guided navigation and visualization of data sets based on these extra dimensions, as well as partitioning data sets to highlight relevant subsets to be made available for exchange. The DAX (Data Acquisition to eXchange) Web-based tool uses a flexible metadata representation model and takes advantage of multi-dimensional data structures to translate metadata types into data dimensions, effectively reshaping data sets according to available metadata. With that, metadata is tightly integrated into the acquisition-to-exchange cycle, allowing for more focused exploration of data sets while also increasing the value of, and incentives for, keeping good metadata. The tool is being developed and tested with optical data collected in different settings, including laboratory, field, airborne, and satellite platforms.
Web Standard: PDF - When to Use, Document Metadata, PDF Sections
PDF files provide some benefits when used appropriately. PDF files should not be used for short documents ( 5 pages) unless retaining the format for printing is important. PDFs should have internal file metadata and meet section 508 standards.
NASA Astrophysics Data System (ADS)
Koppe, Roland; Scientific MaNIDA-Team
2013-04-01
The Marine Network for Integrated Data Access (MaNIDA) aims to build a sustainable e-infrastructure to support discovery and re-use of marine data from distinct data providers in Germany (see related abstracts in session ESSI 1.2). In order to provide users integrated access and retrieval of expedition or cruise metadata, data, services and publications as well as relationships among the various objects, we are developing (web) applications based on state of the art technologies: the Data Portal of German Marine Research. Since the German network of distributed content providers have distinct objectives and mandates for storing digital objects (e.g. long-term data preservation, near real time data, publication repositories), we have to cope with heterogeneous metadata in terms of syntax and semantic, data types and formats as well as access solutions. We have defined a set of core metadata elements which are common to our content providers and therefore useful for discovery and building relationships among objects. Existing catalogues for various types of vocabularies are being used to assure the mapping to community-wide used terms. We distinguish between expedition metadata and continuously harvestable metadata objects from distinct data providers. • Existing expedition metadata from distinct sources is integrated and validated in order to create an expedition metadata catalogue which is used as authoritative source for expedition-related content. The web application allows browsing by e.g. research vessel and date, exploring expeditions and research gaps by tracklines and viewing expedition details (begin/end, ports, platforms, chief scientists, events, etc.). Also expedition-related objects from harvesting are dynamically associated with expedition information and presented to the user. Hence we will provide web services to detailed expedition information. • Other harvestable content is separated into four categories: archived data and data products, near real time data, publications and reports. Reports are a special case of publication, describing cruise planning, cruise reports or popular reports on expeditions and are orthogonal to e.g. peer-reviewed articles. Each object's metadata contains at least: identifier(s) e.g. doi/hdl, title, author(s), date, expedition(s), platform(s) e.g. research vessel Polarstern. Furthermore project(s), parameter(s), device(s) and e.g. geographic coverage are of interest. An international gazetteer resolves geographic coverage to region names and annotates to object metadata. Information is homogenously presented to the user, independent of the underlying format, but adaptable to specific disciplines e.g. bathymetry. Also data access and dissemination information is available to the user as data download link or web services (e.g. WFS, WMS). Based on relationship metadata we are dynamically building graphs of objects to support the user in finding possible relevant associated objects. Technically metadata is based on ISO / OGC standards or provider specification. Metadata is harvested via OAI-PMH or OGC CSW and indexed with Apache Lucene. This enables powerful full-text search, geographic and temporal search as well as faceting. In this presentation we will illustrate the architecture and the current implementation of our integrated approach.
The Value of Data and Metadata Standardization for Interoperability in Giovanni
NASA Astrophysics Data System (ADS)
Smit, C.; Hegde, M.; Strub, R. F.; Bryant, K.; Li, A.; Petrenko, M.
2017-12-01
Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization. This poster gives examples of how our metadata and data standards, both external and internal, have both simplified our code base and improved our users' experiences.
NASA Astrophysics Data System (ADS)
Craig, N.; Mendez, B. J.; Hanisch, R. J.; Christian, C. A.; Summers, F.; Haisch, B.; Lindblom, J.
2005-05-01
We will describe the development of protocols to make Astronomy press-release quality images from HST and other sources publicly available through compatibility with the National Virtual Observatory (NVO). We will present the designs for a public portal to these resources, based on a robust evaluation of our intended audience. The availability of press-release quality materials via the NVO through a simplified interface will greatly enhance the utility of these materials for the public. Behind any portal to NVO data there is a standard registry and data structures that allow collections of data (such as the press release images) to be located and acquired. We will describe our design of the necessary protocols and metadata being used within the NVO framework for this project. We base our meta-tags on the considerable existing work done in the science community as well as the NASA education community. These refined metadata are applied to new HST press-release images as they are produced and registered with the NVO. We will describe methods for retrofitting pre-existing imagery with the metadata standards. The rich media, 3D navigation and visualization capabilities of the browser created by ManyOne Network Inc. are particularly well suited to the presentation of astronomical information and ever more detailed models of the local neighborhood, the Milky Way, etc. We will discuss the 3D navigation and visualization capabilities of the browser with particular focus on the Milky Way Galaxy. Development of an online encyclopedia to accompany the ManyOne portals as part of the Virtual Cosmos will also be described. Support from NASA's AISR Program is gratefully acknowledged.
International Metadata Standards and Enterprise Data Quality Metadata Systems
NASA Technical Reports Server (NTRS)
Habermann, Ted
2016-01-01
Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The one size fits all paradigm does not generally work well in this situation. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.
Inferring Metadata for a Semantic Web Peer-to-Peer Environment
ERIC Educational Resources Information Center
Brase, Jan; Painter, Mark
2004-01-01
Learning Objects Metadata (LOM) aims at describing educational resources in order to allow better reusability and retrieval. In this article we show how additional inference rules allows us to derive additional metadata from existing ones. Additionally, using these rules as integrity constraints helps us to define the constraints on LOM elements,…
Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian
2017-06-05
Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.
Data Management Rubric for Video Data in Organismal Biology.
Brainerd, Elizabeth L; Blob, Richard W; Hedrick, Tyson L; Creamer, Andrew T; Müller, Ulrike K
2017-07-01
Standards-based data management facilitates data preservation, discoverability, and access for effective data reuse within research groups and across communities of researchers. Data sharing requires community consensus on standards for data management, such as storage and formats for digital data preservation, metadata (i.e., contextual data about the data) that should be recorded and stored, and data access. Video imaging is a valuable tool for measuring time-varying phenotypes in organismal biology, with particular application for research in functional morphology, comparative biomechanics, and animal behavior. The raw data are the videos, but videos alone are not sufficient for scientific analysis. Nearly endless videos of animals can be found on YouTube and elsewhere on the web, but these videos have little value for scientific analysis because essential metadata such as true frame rate, spatial calibration, genus and species, weight, age, etc. of organisms, are generally unknown. We have embarked on a project to build community consensus on video data management and metadata standards for organismal biology research. We collected input from colleagues at early stages, organized an open workshop, "Establishing Standards for Video Data Management," at the Society for Integrative and Comparative Biology meeting in January 2017, and then collected two more rounds of input on revised versions of the standards. The result we present here is a rubric consisting of nine standards for video data management, with three levels within each standard: good, better, and best practices. The nine standards are: (1) data storage; (2) video file formats; (3) metadata linkage; (4) video data and metadata access; (5) contact information and acceptable use; (6) camera settings; (7) organism(s); (8) recording conditions; and (9) subject matter/topic. The first four standards address data preservation and interoperability for sharing, whereas standards 5-9 establish minimum metadata standards for organismal biology video, and suggest additional metadata that may be useful for some studies. This rubric was developed with substantial input from researchers and students, but still should be viewed as a living document that should be further refined and updated as technology and research practices change. The audience for these standards includes researchers, journals, and granting agencies, and also the developers and curators of databases that may contribute to video data sharing efforts. We offer this project as an example of building community consensus for data management, preservation, and sharing standards, which may be useful for future efforts by the organismal biology research community. © The Author 2017. Published by Oxford University Press on behalf of the Society for Integrative and Comparative Biology.
Data Management Rubric for Video Data in Organismal Biology
Brainerd, Elizabeth L.; Blob, Richard W.; Hedrick, Tyson L.; Creamer, Andrew T.; Müller, Ulrike K.
2017-01-01
Synopsis Standards-based data management facilitates data preservation, discoverability, and access for effective data reuse within research groups and across communities of researchers. Data sharing requires community consensus on standards for data management, such as storage and formats for digital data preservation, metadata (i.e., contextual data about the data) that should be recorded and stored, and data access. Video imaging is a valuable tool for measuring time-varying phenotypes in organismal biology, with particular application for research in functional morphology, comparative biomechanics, and animal behavior. The raw data are the videos, but videos alone are not sufficient for scientific analysis. Nearly endless videos of animals can be found on YouTube and elsewhere on the web, but these videos have little value for scientific analysis because essential metadata such as true frame rate, spatial calibration, genus and species, weight, age, etc. of organisms, are generally unknown. We have embarked on a project to build community consensus on video data management and metadata standards for organismal biology research. We collected input from colleagues at early stages, organized an open workshop, “Establishing Standards for Video Data Management,” at the Society for Integrative and Comparative Biology meeting in January 2017, and then collected two more rounds of input on revised versions of the standards. The result we present here is a rubric consisting of nine standards for video data management, with three levels within each standard: good, better, and best practices. The nine standards are: (1) data storage; (2) video file formats; (3) metadata linkage; (4) video data and metadata access; (5) contact information and acceptable use; (6) camera settings; (7) organism(s); (8) recording conditions; and (9) subject matter/topic. The first four standards address data preservation and interoperability for sharing, whereas standards 5–9 establish minimum metadata standards for organismal biology video, and suggest additional metadata that may be useful for some studies. This rubric was developed with substantial input from researchers and students, but still should be viewed as a living document that should be further refined and updated as technology and research practices change. The audience for these standards includes researchers, journals, and granting agencies, and also the developers and curators of databases that may contribute to video data sharing efforts. We offer this project as an example of building community consensus for data management, preservation, and sharing standards, which may be useful for future efforts by the organismal biology research community. PMID:28881939
Automated Transformation of CDISC ODM to OpenClinica.
Gessner, Sophia; Storck, Michael; Hegselmann, Stefan; Dugas, Martin; Soto-Rey, Iñaki
2017-01-01
Due to the increasing use of electronic data capture systems for clinical research, the interest in saving resources by automatically generating and reusing case report forms in clinical studies is growing. OpenClinica, an open-source electronic data capture system enables the reuse of metadata in its own Excel import template, hampering the reuse of metadata defined in other standard formats. One of these standard formats is the Operational Data Model for metadata, administrative and clinical data in clinical studies. This work suggests a mapping from Operational Data Model to OpenClinica and describes the implementation of a converter to automatically generate OpenClinica conform case report forms based upon metadata in the Operational Data Model.
Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation
NASA Astrophysics Data System (ADS)
Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.
2016-12-01
Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally deployed DataONE quality service can achieve major efficiency gains by allowing member repositories to customize and use recommendations that fit their specific needs without having to create de novo infrastructure at their site.
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort
NASA Astrophysics Data System (ADS)
Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.
2017-12-01
Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
The Importance of Metadata in System Development and IKM
2003-02-01
Defence R& D Canada The Importance of Metadata in System Development and IKM Anthony W. Isenor Technical Memorandum DRDC Atlantic TM 2003-011...Metadata in System Development and IKM Anthony W. Isenor Defence R& D Canada – Atlantic Technical Memorandum DRDC Atlantic TM 2003-011 February... it is important for searches and providing relevant information to the client. A comparison of metadata standards was conducted with emphasis on
Design and Implementation of a Metadata-rich File System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ames, S; Gokhale, M B; Maltzahn, C
2010-01-19
Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less
Facilitating Stewardship of scientific data through standards based workflows
NASA Astrophysics Data System (ADS)
Bastrakova, I.; Kemp, C.; Potter, A. K.
2013-12-01
There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
36 CFR 1235.48 - What documentation must agencies transfer with electronic records?
Code of Federal Regulations, 2010 CFR
2010-07-01
... digital geospatial data files can include metadata that conforms to the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, as specified in Executive Order 12906 of April... number (301) 837-2903 for digital photographs and metadata, or the National Archives and Records...
36 CFR 1235.48 - What documentation must agencies transfer with electronic records?
Code of Federal Regulations, 2012 CFR
2012-07-01
... digital geospatial data files can include metadata that conforms to the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, as specified in Executive Order 12906 of April... number (301) 837-2903 for digital photographs and metadata, or the National Archives and Records...
Leveraging Metadata to Create Better Web Services
ERIC Educational Resources Information Center
Mitchell, Erik
2012-01-01
Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…
36 CFR § 1235.48 - What documentation must agencies transfer with electronic records?
Code of Federal Regulations, 2013 CFR
2013-07-01
... digital geospatial data files can include metadata that conforms to the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, as specified in Executive Order 12906 of April... number (301) 837-2903 for digital photographs and metadata, or the National Archives and Records...
36 CFR 1235.48 - What documentation must agencies transfer with electronic records?
Code of Federal Regulations, 2011 CFR
2011-07-01
... digital geospatial data files can include metadata that conforms to the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, as specified in Executive Order 12906 of April... number (301) 837-2903 for digital photographs and metadata, or the National Archives and Records...
36 CFR 1235.48 - What documentation must agencies transfer with electronic records?
Code of Federal Regulations, 2014 CFR
2014-07-01
... digital geospatial data files can include metadata that conforms to the Federal Geographic Data Committee's Content Standards for Digital Geospatial Metadata, as specified in Executive Order 12906 of April... number (301) 837-2903 for digital photographs and metadata, or the National Archives and Records...
Shared Geospatial Metadata Repository for Ontario University Libraries: Collaborative Approaches
ERIC Educational Resources Information Center
Forward, Erin; Leahey, Amber; Trimble, Leanne
2015-01-01
Successfully providing access to special collections of digital geospatial data in academic libraries relies upon complete and accurate metadata. Creating and maintaining metadata using specialized standards is a formidable challenge for libraries. The Ontario Council of University Libraries' Scholars GeoPortal project, which created a shared…
Making metadata usable in a multi-national research setting.
Ellul, Claire; Foord, Joanna; Mooney, John
2013-11-01
SECOA (Solutions for Environmental Contrasts in Coastal Areas) is a multi-national research project examining the effects of human mobility on urban settlements in fragile coastal environments. This paper describes the setting up of a SECOA metadata repository for non-specialist researchers such as environmental scientists and tourism experts. Conflicting usability requirements of two groups - metadata creators and metadata users - are identified along with associated limitations of current metadata standards. A description is given of a configurable metadata system designed to grow as the project evolves. This work is of relevance for similar projects such as INSPIRE. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Shrestha, Biva; Palanisamy, Giri
Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, andmore » locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register their datasets to the NGEE data archive and allows the NGEE archive to publish these datasets via a data search portal (http://ngee.ornl.gov/data). These highly descriptive metadata created using OME allows the Archive to enable advanced data search options using keyword, geo-spatial, temporal and ontology filters. Similarly, ARM OME allows scientists or principal investigators (PIs) to submit their data products to the ARM data archive. How would OME help Big Data Centers like the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)? The ORNL DAAC is one of NASA s Earth Observing System Data and Information System (EOSDIS) data centers managed by the Earth Science Data and Information System (ESDIS) Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological, geological, and chemical components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the NGEE and ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability. Useful Links: USGS OME: http://mercury.ornl.gov/OME/ NGEE OME: http://ngee-arctic.ornl.gov/ngeemetadata/ ARM OME: http://archive2.ornl.gov/armome/ Contact: Ranjeet Devarakonda (devarakondar@ornl.gov) References: [1] Federal Geographic Data Committee. Content standard for digital geospatial metadata. Federal Geographic Data Committee, 1998. [2] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [3] Wilson, B. E., Palanisamy, G., Devarakonda, R., Rhyne, B. T., Lindsley, C., & Green, J. (2010). Mercury Toolset for Spatiotemporal Metadata. [4] Pouchard, L. C., Branstetter, M. L., Cook, R. B., Devarakonda, R., Green, J., Palanisamy, G., ... & Noy, N. F. (2013). A Linked Science investigation: enhancing climate change data discovery with semantic technologies. Earth science informatics, 6(3), 175-185.« less
Metadata-Driven SOA-Based Application for Facilitation of Real-Time Data Warehousing
NASA Astrophysics Data System (ADS)
Pintar, Damir; Vranić, Mihaela; Skočir, Zoran
Service-oriented architecture (SOA) has already been widely recognized as an effective paradigm for achieving integration of diverse information systems. SOA-based applications can cross boundaries of platforms, operation systems and proprietary data standards, commonly through the usage of Web Services technology. On the other side, metadata is also commonly referred to as a potential integration tool given the fact that standardized metadata objects can provide useful information about specifics of unknown information systems with which one has interest in communicating with, using an approach commonly called "model-based integration". This paper presents the result of research regarding possible synergy between those two integration facilitators. This is accomplished with a vertical example of a metadata-driven SOA-based business process that provides ETL (Extraction, Transformation and Loading) and metadata services to a data warehousing system in need of a real-time ETL support.
CellML metadata standards, associated tools and repositories
Beard, Daniel A.; Britten, Randall; Cooling, Mike T.; Garny, Alan; Halstead, Matt D.B.; Hunter, Peter J.; Lawson, James; Lloyd, Catherine M.; Marsh, Justin; Miller, Andrew; Nickerson, David P.; Nielsen, Poul M.F.; Nomura, Taishin; Subramanium, Shankar; Wimalaratne, Sarala M.; Yu, Tommy
2009-01-01
The development of standards for encoding mathematical models is an important component of model building and model sharing among scientists interested in understanding multi-scale physiological processes. CellML provides such a standard, particularly for models based on biophysical mechanisms, and a substantial number of models are now available in the CellML Model Repository. However, there is an urgent need to extend the current CellML metadata standard to provide biological and biophysical annotation of the models in order to facilitate model sharing, automated model reduction and connection to biological databases. This paper gives a broad overview of a number of new developments on CellML metadata and provides links to further methodological details available from the CellML website. PMID:19380315
Expanding Access to NCAR's Digital Assets: Towards a Unified Scientific Data Management System
NASA Astrophysics Data System (ADS)
Stott, D.
2016-12-01
In 2014 the National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement the strategic vision of an integrated front door for data discovery and access across the organization, including all laboratories, the library, and UCAR Community Programs. The DSET is focused on improving the quality of users' experiences in finding and using NCAR's digital assets. This effort also supports new policies included in federal mandates, NSF requirements, and journal publication rules. An initial survey with 97 respondents identified 68 persons responsible for more than 3 petabytes of data. An inventory, using the Data Asset Framework produced by the UK Digital Curation Centre as a starting point, identified asset types that included files and metadata, publications, images, and software (visualization, analysis, model codes). User story sessions with representatives from each lab identified and ranked desired features for a unified Scientific Data Management System (SDMS). A process beginning with an organization-wide assessment of metadata by the HDF Group and followed by meetings with labs to identify key documentation concepts, culminated in the development of an NCAR metadata dialect that leverages the DataCite and ISO 19115 standards. The tasks ahead are to build out an SDMS and populate it with rich standardized metadata. Software packages have been prototyped and currently are being tested and reviewed by DSET members. Key challenges for the DSET include technical and non-technical issues. First, the status quo with regard to how assets are managed varies widely across the organization. There are differences in file format standards, technologies, and discipline-specific vocabularies. Metadata diversity is another real challenge. The types of metadata, the standards used, and the capacity to create new metadata varies across the organization. Significant effort is required to develop tools to create new standard metadata across the organization, adapt and integrate current digital assets, and establish consistent data management practices going forward. To be successful, best practices must be infused into daily activities. This poster will highlight the processes, lessons learned, and current status of the DSET effort at NCAR.
Simplified Metadata Curation via the Metadata Management Tool
NASA Astrophysics Data System (ADS)
Shum, D.; Pilone, D.
2015-12-01
The Metadata Management Tool (MMT) is the newest capability developed as part of NASA Earth Observing System Data and Information System's (EOSDIS) efforts to simplify metadata creation and improve metadata quality. The MMT was developed via an agile methodology, taking into account inputs from GCMD's science coordinators and other end-users. In its initial release, the MMT uses the Unified Metadata Model for Collections (UMM-C) to allow metadata providers to easily create and update collection records in the ISO-19115 format. Through a simplified UI experience, metadata curators can create and edit collections without full knowledge of the NASA Best Practices implementation of ISO-19115 format, while still generating compliant metadata. More experienced users are also able to access raw metadata to build more complex records as needed. In future releases, the MMT will build upon recent work done in the community to assess metadata quality and compliance with a variety of standards through application of metadata rubrics. The tool will provide users with clear guidance as to how to easily change their metadata in order to improve their quality and compliance. Through these features, the MMT allows data providers to create and maintain compliant and high quality metadata in a short amount of time.
Community-Based Development of Standards for Geochemical and Geochronological Data
NASA Astrophysics Data System (ADS)
Lehnert, K. A.; Walker, D.; Vinay, S.; Djapic, B.; Ash, J.; Falk, B.
2007-12-01
The Geoinformatics for Geochemistry (GfG) Program (www.geoinfogeochem.org) and the EarthChem project (www.earthchem.org) aim to maximize the application of geochemical data in Geoscience research and education by building a new advanced data infrastructure for geochemistry that facilitates the compilation, communication, serving, and visualization of geochemical data and their integration with the broad Geoscience data set. Building this new data infrastructure poses substantial challenges that are primarily cultural in nature, and require broad community involvement in the development and implementation of standards for data reporting (e.g., metadata for analytical procedures, data quality, and analyzed samples), data publication, and data citation to achieve broad acceptance and use. Working closely with the science community, with professional societies, and with editors and publishers, recommendations for standards for the reporting of geochemical and geochronological data in publications and to data repositories have been established, which are now under consideration for adoption in journal and agency policies. The recommended standards are aligned with the GfG and EarthChem data models as well as the EarthChem XML schema for geochemical data. Through partnerships with other national and international data management efforts in geochemistry and in the broader marine and terrestrial geosciences, GfG and EarthChem seek to integrate their development of geochemical metadata standards, data format, and semantics with relevant existing and emerging standards and ensure compatibility and compliance.
Air Quality uFIND: User-oriented Tool Set for Air Quality Data Discovery and Access
NASA Astrophysics Data System (ADS)
Hoijarvi, K.; Robinson, E. M.; Husar, R. B.; Falke, S. R.; Schultz, M. G.; Keating, T. J.
2012-12-01
Historically, there have been major impediments to seamless and effective data usage encountered by both data providers and users. Over the last five years, the international Air Quality (AQ) Community has worked through forums such as the Group on Earth Observations AQ Community of Practice, the ESIP AQ Working Group, and the Task Force on Hemispheric Transport of Air Pollution to converge on data format standards (e.g., netCDF), data access standards (e.g., Open Geospatial Consortium Web Coverage Services), metadata standards (e.g., ISO 19115), as well as other conventions (e.g., CF Naming Convention) in order to build an Air Quality Data Network. The centerpiece of the AQ Data Network is the web service-based tool set: user-oriented Filtering and Identification of Networked Data. The purpose of uFIND is to provide rich and powerful facilities for the user to: a) discover and choose a desired dataset by navigation through the multi-dimensional metadata space using faceted search, b) seamlessly access and browse datasets, and c) use uFINDs facilities as a web service for mashups with other AQ applications and portals. In a user-centric information system such as uFIND, the user experience is improved by metadata that includes the general fields for discovery as well as community-specific metadata to narrow the search beyond space, time and generic keyword searches. However, even with the community-specific additions, the ISO 19115 records were formed in compliance with the standard, so that other standards-based search interface could leverage this additional information. To identify the fields necessary for metadata discovery we started with the ISO 19115 Core Metadata fields and fields that were needed for a Catalog Service for the Web (CSW) Record. This fulfilled two goals - one to create valid ISO 19115 records and the other to be able to retrieve the records through a Catalog Service for the Web query. Beyond the required set of fields, the AQ Community added additional fields using a combination of keywords and ISO 19115 fields. These extensions allow discovery by measurement platform or observed phenomena. Beyond discovery metadata, the AQ records include service identification objects that allow standards-based clients, such as some brokers, to access the data found via OGC WCS or WMS data access protocols. uFIND, is one such smart client, this combination of discovery and access metadata allows the user to preview each registered dataset through spatial and temporal views; observe the data access and usage pattern and also find links to dataset-specific metadata directly in uFIND. The AQ data providers also benefit from this architecture since their data products are easier to find and re-use, enhancing the relevance and importance of their products. Finally, the earth science community at large benefits from the Service Oriented Architecture of uFIND, since it is a service itself and allows service-based interfacing with providers and users of the metadata, allowing uFIND facets to be further refined for a particular AQ application or completely repurposed for other Earth Science domains that use the same set of data access and metadata standards.
A Solution to Metadata: Using XML Transformations to Automate Metadata
2010-06-01
developed their own metadata standards—Directory Interchange Format (DIF), Ecological Metadata Language ( EML ), and International Organization for...mented all their data using the EML standard. However, when later attempting to publish to a data clearinghouse— such as the Geospatial One-Stop (GOS...construct calls to its transform(s) method by providing the type of the incoming content (e.g., eml ), the type of the resulting content (e.g., fgdc) and
The International Learning Object Metadata Survey
ERIC Educational Resources Information Center
Friesen, Norm
2004-01-01
A wide range of projects and organizations is currently making digital learning resources (learning objects) available to instructors, students, and designers via systematic, standards-based infrastructures. One standard that is central to many of these efforts and infrastructures is known as Learning Object Metadata (IEEE 1484.12.1-2002, or LOM).…
Metadata: Standards for Retrieving WWW Documents (and Other Digitized and Non-Digitized Resources)
NASA Astrophysics Data System (ADS)
Rusch-Feja, Diann
The use of metadata for indexing digitized and non-digitized resources for resource discovery in a networked environment is being increasingly implemented all over the world. Greater precision is achieved using metadata than relying on universal search engines and furthermore, meta-data can be used as filtering mechanisms for search results. An overview of various metadata sets is given, followed by a more focussed presentation of Dublin Core Metadata including examples of sub-elements and qualifiers. Especially the use of the Dublin Core Relation element provides connections between the metadata of various related electronic resources, as well as the metadata for physical, non-digitized resources. This facilitates more comprehensive search results without losing precision and brings together different genres of information which would otherwise be only searchable in separate databases. Furthermore, the advantages of Dublin Core Metadata in comparison with library cataloging and the use of universal search engines are discussed briefly, followed by a listing of types of implementation of Dublin Core Metadata.
A practical implementation for a data dictionary in an environment of diverse data sets
Sprenger, Karla K.; Larsen, Dana M.
1993-01-01
The need for a data dictionary database at the U.S. Geological Survey's EROS Data Center (EDC) was reinforced with the Earth Observing System Data and Information System (EOSDIS) requirement for consistent field definitions of data sets residing at more than one archive center. The EDC requirement addresses the existence of multiple sets with identical field definitions using various naming conventions. The EDC is developing a data dictionary database to accomplish the following foals: to standardize field names for ease in software development; to facilitate querying and updating of the date; and to generate ad hoc reports. The structure of the EDC electronic data dictionary database supports different metadata systems as well as many different data sets. A series of reports is used to keep consistency among data sets and various metadata systems.
Manifestations of Metadata: From Alexandria to the Web--Old is New Again
ERIC Educational Resources Information Center
Kennedy, Patricia
2008-01-01
This paper is a discussion of the use of metadata, in its various manifestations, to access information. Information management standards are discussed. The connection between the ancient world and the modern world is highlighted. Individual perspectives are paramount in fulfilling information seeking. Metadata is interpreted and reflected upon in…
iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata
ERIC Educational Resources Information Center
Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen
2012-01-01
Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…
Standardizing paleoclimate variables for data-intensive science
NASA Astrophysics Data System (ADS)
Lockshin, S.; Morrill, C.; Gille, E.; Gross, W.; McNeill, S.; Shepherd, E.; Wahl, E. R.; Bauer, B.
2017-12-01
Paleoclimate data are extremely heterogeneous. Scientists routinely make hundreds of types of measurements on a variety of physical samples. This heterogeneity is one of the biggest barriers to developing and accessing exhaustive, standardized paleoclimate data products. Moreover, it hinders the use of paleo data outside of the paleoclimate specialist community. We present our progress on creating a set of standards for documenting paleoclimate variables at the World Data Service for Paleoclimatology (WDS-Paleo). The current WDS-Paleo nine-part variable naming scheme provides the foundation for this project. This framework is designed for use with all eighteen proxy and reconstruction data types archived by the WDS-Paleo. Under the guidance of advisory panels consisting of subject matter experts, we have generated controlled vocabularies for use within this framework that are specific to individual data types yet integrated across data types. These vocabularies are thorough, precise, standardized and extensible. We have applied these new controlled vocabularies to existing WDS-Paleo datasets, creating homogeneous variable metadata as well as enabling a new paleoclimate metadata search by variable that is integrated across data types. This work will allow for the reuse of studies in larger compilations to forward scientific discovery that would not be possible from any single study. It will also facilitate new, interdisciplinary uses for paleoclimate datasets.
Representing Hydrologic Models as HydroShare Resources to Facilitate Model Sharing and Collaboration
NASA Astrophysics Data System (ADS)
Castronova, A. M.; Goodall, J. L.; Mbewe, P.
2013-12-01
The CUAHSI HydroShare project is a collaborative effort that aims to provide software for sharing data and models within the hydrologic science community. One of the early focuses of this work has been establishing metadata standards for describing models and model-related data as HydroShare resources. By leveraging this metadata definition, a prototype extension has been developed to create model resources that can be shared within the community using the HydroShare system. The extension uses a general model metadata definition to create resource objects, and was designed so that model-specific parsing routines can extract and populate metadata fields from model input and output files. The long term goal is to establish a library of supported models where, for each model, the system has the ability to extract key metadata fields automatically, thereby establishing standardized model metadata that will serve as the foundation for model sharing and collaboration within HydroShare. The Soil Water & Assessment Tool (SWAT) is used to demonstrate this concept through a case study application.
A document centric metadata registration tool constructing earth environmental data infrastructure
NASA Astrophysics Data System (ADS)
Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.
2009-12-01
DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool provides the section titles of the documents and by filling out the content of each section, the documents for the data sets are automatically published in PDF and HTML format. Furthermore, the metadata XML file which is compliant with ISO19115 and ISO19139 is created at the same moment. The generated metadata are managed in the metadata database of the DIAS project, and will be used in various ISO19139 compliant metadata management tools, such as GeoNetwork.
Inheritance rules for Hierarchical Metadata Based on ISO 19115
NASA Astrophysics Data System (ADS)
Zabala, A.; Masó, J.; Pons, X.
2012-04-01
Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata registry is complete for each metadata hierarchical level, but at the implementation level most of the metadata elements are not stored at both levels but only at more generic one. This communication defines a metadata system that covers 4 levels, describes which metadata has to support series-layer inheritance and in which way, and how hierarchical levels are defined and stored. Metadata elements are classified according to the type of inheritance between products, series, tiles and the datasets. It explains the metadata elements classification and exemplifies it using core metadata elements. The communication also presents a metadata viewer and edition tool that uses the described model to propagate metadata elements and to show to the user a complete set of metadata for each level in a transparent way. This tool is integrated in the MiraMon GIS software.
High-performance metadata indexing and search in petascale data storage systems
NASA Astrophysics Data System (ADS)
Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.
2008-07-01
Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.
Elements of a next generation time-series ASCII data file format for Earth Sciences
NASA Astrophysics Data System (ADS)
Webster, C. J.
2015-12-01
Data in ASCII comma separated value (CSV) format are recognized as the most simple, straightforward and readable type of data present in the geosciences. Many scientific workflows developed over the years rely on data using this simple format. However, there is a need for a lightweight ASCII header format standard that is easy to create and easy to work with. Current OGC grade XML standards are complex and difficult to implement for researchers with few resources. Ideally, such a format should provide the data in CSV for easy consumption by generic applications such as spreadsheets. The format should use an existing time standard. The header should be easily human readable as well as machine parsable. The metadata format should be extendable to allow vocabularies to be adopted as they are created by external standards bodies. The creation of such a format will increase the productivity of software engineers and scientists because fewer translators and checkers would be required. Data in ASCII comma separated value (CSV) format are recognized as the most simple, straightforward and readable type of data present in the geosciences. Many scientific workflows developed over the years rely on data using this simple format. However, there is a need for a lightweight ASCII header format standard that is easy to create and easy to work with. Current OGC grade XML standards are complex and difficult to implement for researchers with few resources. Ideally, such a format would provide the data in CSV for easy consumption by generic applications such as spreadsheets. The format would use existing time standard. The header would be easily human readable as well as machine parsable. The metadata format would be extendable to allow vocabularies to be adopted as they are created by external standards bodies. The creation of such a format would increase the productivity of software engineers and scientists because fewer translators would be required.
ERIC Educational Resources Information Center
Armstrong, C. J.
1997-01-01
Discusses PICS (Platform for Internet Content Selection), the Centre for Information Quality Management (CIQM), and metadata. Highlights include filtering networked information; the quality of information; and standardizing search engines. (LRW)
NASA Astrophysics Data System (ADS)
Ventouras, Spiros; Lawrence, Bryan; Woolf, Andrew; Cox, Simon
2010-05-01
The Metadata Objects for Linking Environmental Sciences (MOLES) model has been developed within the Natural Environment Research Council (NERC) DataGrid project [NERC DataGrid] to fill a missing part of the ‘metadata spectrum'. It is a framework within which to encode the relationships between the tools used to obtain data, the activities which organised their use, and the datasets produced. MOLES is primarily of use to consumers of data, especially in an interdisciplinary context, to allow them to establish details of provenance, and to compare and contrast such information without recourse to discipline-specific metadata or private communications with the original investigators [Lawrence et al 2009]. MOLES is also of use to the custodians of data, providing an organising paradigm for the data and metadata. The work described in this paper is a high-level view of the structure and content of a recent major revision of MOLES (v3.3) carried out as part of a NERC DataGrid extension project. The concepts of MOLES v3.3 are rooted in the harmonised ISO model [Harmonised ISO model] - particularly in metadata standards (ISO 19115, ISO 19115-2) and the ‘Observations and Measurements' conceptual model (ISO 19156). MOLES exploits existing concepts and relationships, and specialises information in these standards. A typical sequence of data capturing involves one or more projects under which a number of activities are undertaken, using appropriate tools and methods to produce the datasets. Following this typical sequence, the relevant metadata can be partitioned into the following main sections - helpful in mapping onto the most suitable standards from the ISO 19100 series. • Project section • Activity section (including both observation acquisition and numerical computation) • Observation section (metadata regarding the methods used to obtained the data, the spatial and temporal sampling regime, quality etc.) • Observation collection section The key concepts in MOLES v3.3 are: a) the result of an observation is defined uniquely from the property (of a feature-of-interest), the sampling-feature (carrying the targeted property values), the procedure used to obtain the result and the time (discrete instant or period) at which the observation takes place. b) an ‘Acquisition' and a ‘Computation' can serve as the basis for describing any observation process chain (procedure). The ‘Acquisition' uses an instrument - sensor or human being - to produce the results and is associated with field trips, flights, cruises etc., whereas the ‘Computation' class involves specific processing steps. A process chain may consist of any combination of ‘Acquisitions' and/or ‘Computations' occurring in parallel or in any order during the data capturing sequence. c) The results can be organised in collections with significantly more flexibility than if one used the original project alone d) the structure of individual observation collections may be domain-specific, in general; however we are investigating the use of CSML (Climate Science Modelling Language) for atmospheric data The model has been tested as a desk exercise by constructing object models for scenarios from various disciplines. References NERC DATAGRID: http://ndg.nerc.ac.uk LAWRENCE ET. AL. ,Information in environmental data grids, Phil. Trans. R. Soc. A, March 2009 vol. 367 no. 1890 1003-1014 ISO HARMONISED MODEL: All relevant ISO standards for geographic metadata from the TC211 series (eg. ISO 19xxx), and is harmonised within a formal UML description in the ‘HollowWorld' packages available at https://www.seegrid.csiro.au/twiki/bin/view/AppSchemas/HollowWorld
Big Earth Data Initiative: Metadata Improvement: Case Studies
NASA Technical Reports Server (NTRS)
Kozimor, John; Habermann, Ted; Farley, John
2016-01-01
Big Earth Data Initiative (BEDI) The Big Earth Data Initiative (BEDI) invests in standardizing and optimizing the collection, management and delivery of U.S. Government's civil Earth observation data to improve discovery, access use, and understanding of Earth observations by the broader user community. Complete and consistent standard metadata helps address all three goals.
A model for enhancing Internet medical document retrieval with "medical core metadata".
Malet, G; Munoz, F; Appleyard, R; Hersh, W
1999-01-01
Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
linkedISA: semantic representation of ISA-Tab experimental metadata.
González-Beltrán, Alejandra; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe
2014-01-01
Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to drive the query over the semantic representation of the experimental design. In addition, we defined queries over the linkedISA RDF representation and demonstrated its use over the linkedISA conversion of datasets from Nature' Scientific Data online publication. Our linked data approach has allowed us to: 1) make the ISA-Tab semantics explicit and machine-processable, 2) exploit the existing ontology-based annotations in the ISA-Tab experimental descriptions, 3) augment the ISA-Tab syntax with new descriptive elements, 4) visualise and query elements related to the experimental design. Reasoning over ISA-Tab metadata and associated data will facilitate data integration and knowledge discovery.
EARS : Repositioning data management near data acquisition.
NASA Astrophysics Data System (ADS)
Sinquin, Jean-Marc; Sorribas, Jordi; Diviacco, Paolo; Vandenberghe, Thomas; Munoz, Raquel; Garcia, Oscar
2016-04-01
The EU FP7 Projects Eurofleets and Eurofleets2 are an European wide alliance of marine research centers that aim to share their research vessels, to improve information sharing on planned, current and completed cruises, on details of ocean-going research vessels and specialized equipment, and to durably improve cost-effectiveness of cruises. Within this context logging of information on how, when and where anything happens on board of the vessel is crucial information for data users in a later stage. This forms a primordial step in the process of data quality control as it could assist in the understanding of anomalies and unexpected trends recorded in the acquired data sets. In this way completeness of the metadata is improved as it is recorded accurately at the origin of the measurement. The collection of this crucial information has been done in very different ways, using different procedures, formats and pieces of software in the context of the European Research Fleet. At the time that the Eurofleets project started, every institution and country had adopted different strategies and approaches, which complicated the task of users that need to log general purpose information and events on-board whenever they access a different platform loosing the opportunity to produce this valuable metadata on-board. Among the many goals the Eurofleets project has, a very important task is the development of an "event log software" called EARS (Eurofleets Automatic Reporting System) that enables scientists and operators to record what happens during a survey. EARS will allow users to fill, in a standardized way, the gap existing at the moment in metadata description that only very seldom links data with its history. Events generated automatically by acquisition instruments will also be handled, enhancing the granularity and precision of the event annotation. The adoption of a common procedure to log survey events and a common terminology to describe them is crucial to provide a friendly and successfully metadata on-board creation procedure for the whole the European Fleet. The possibility of automatically reporting metadata and general purpose data, will simplify the work of scientists and data managers with regards to data transmission. An improved accuracy and completeness of metadata is expected when events are recorded at acquisition time. This will also enhance multiple usages of the data as it allows verification of the different requirements existing in different disciplines.
ISAIA: Interoperable Systems for Archival Information Access
NASA Technical Reports Server (NTRS)
Hanisch, Robert J.
2002-01-01
The ISAIA project was originally proposed in 1999 as a successor to the informal AstroBrowse project. AstroBrowse, which provided a data location service for astronomical archives and catalogs, was a first step toward data system integration and interoperability. The goals of ISAIA were ambitious: '...To develop an interdisciplinary data location and integration service for space science. Building upon existing data services and communications protocols, this service will allow users to transparently query hundreds or thousands of WWW-based resources (catalogs, data, computational resources, bibliographic references, etc.) from a single interface. The service will collect responses from various resources and integrate them in a seamless fashion for display and manipulation by the user.' Funding was approved only for a one-year pilot study, a decision that in retrospect was wise given the rapid changes in information technology in the past few years and the emergence of the Virtual Observatory initiatives in the US and worldwide. Indeed, the ISAIA pilot study was influential in shaping the science goals, system design, metadata standards, and technology choices for the virtual observatory. The ISAIA pilot project also helped to cement working relationships among the NASA data centers, US ground-based observatories, and international data centers. The ISAIA project was formed as a collaborative effort between thirteen institutions that provided data to astronomers, space physicists, and planetary scientists. Among the fruits we ultimately hoped would come from this project would be a central site on the Web that any space scientist could use to efficiently locate existing data relevant to a particular scientific question. Furthermore, we hoped that the needed technology would be general enough to allow smaller, more-focused community within space science could use the same technologies and standards to provide more specialized services. A major challenge to searching for data across a broad community is that information that describe some data products are either not relevant to other data or not applicable in the same way. Some previous metadata standard development efforts (e.g., in the earth science and library communities) have produced standards that are very large and difficult to support. To address this problem, we studied how a standard may be divided into separable pieces. Data providers that wish to participate in interoperable searches can support only those parts of the standard that are relevant to them. We prototyped a top-level metadata standard that was small and applicable to all space science data.
Visualization of JPEG Metadata
NASA Astrophysics Data System (ADS)
Malik Mohamad, Kamaruddin; Deris, Mustafa Mat
There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.
Geospatial resources for supporting data standards, guidance and best practice in health informatics
2011-01-01
Background The 1980s marked the occasion when Geographical Information System (GIS) technology was broadly introduced into the geo-spatial community through the establishment of a strong GIS industry. This technology quickly disseminated across many countries, and has now become established as an important research, planning and commercial tool for a wider community that includes organisations in the public and private health sectors. The broad acceptance of GIS technology and the nature of its functionality have meant that numerous datasets have been created over the past three decades. Most of these datasets have been created independently, and without any structured documentation systems in place. However, search and retrieval systems can only work if there is a mechanism for datasets existence to be discovered and this is where proper metadata creation and management can greatly help. This situation must be addressed through support mechanisms such as Web-based portal technologies, metadata editor tools, automation, metadata standards and guidelines and collaborative efforts with relevant individuals and organisations. Engagement with data developers or administrators should also include a strategy of identifying the benefits associated with metadata creation and publication. Findings The establishment of numerous Spatial Data Infrastructures (SDIs), and other Internet resources, is a testament to the recognition of the importance of supporting good data management and sharing practices across the geographic information community. These resources extend to health informatics in support of research, public services and teaching and learning. This paper identifies many of these resources available to the UK academic health informatics community. It also reveals the reluctance of many spatial data creators across the wider UK academic community to use these resources to create and publish metadata, or deposit their data in repositories for sharing. The Go-Geo! service is introduced as an SDI developed to provide UK academia with the necessary resources to address the concerns surrounding metadata creation and data sharing. The Go-Geo! portal, Geodoc metadata editor tool, ShareGeo spatial data repository, and a range of other support resources, are described in detail. Conclusions This paper describes a variety of resources available for the health research and public health sector to use for managing and sharing their data. The Go-Geo! service is one resource which offers an SDI for the eclectic range of disciplines using GIS in UK academia, including health informatics. The benefits of data management and sharing are immense, and in these times of cost restraints, these resources can be seen as solutions to find cost savings which can be reinvested in more research. PMID:21269487
NASA Astrophysics Data System (ADS)
Richard, S. M.
2011-12-01
The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata content. The use cases for the detailed content must be well understood, and the degree of metadata complexity should be determined by requirements for those use cases. The ISO standard provides sufficient flexibility that relatively simple metadata records can be created that will serve for text-indexed search/discovery, resource evaluation by a user reading text content from the metadata, and access to the resource via http, ftp, or well-known service protocols (e.g. Thredds; OGC WMS, WFS, WCS).
Sinaci, A Anil; Laleci Erturkmen, Gokce B
2013-10-01
In order to enable secondary use of Electronic Health Records (EHRs) by bridging the interoperability gap between clinical care and research domains, in this paper, a unified methodology and the supporting framework is introduced which brings together the power of metadata registries (MDR) and semantic web technologies. We introduce a federated semantic metadata registry framework by extending the ISO/IEC 11179 standard, and enable integration of data element registries through Linked Open Data (LOD) principles where each Common Data Element (CDE) can be uniquely referenced, queried and processed to enable the syntactic and semantic interoperability. Each CDE and their components are maintained as LOD resources enabling semantic links with other CDEs, terminology systems and with implementation dependent content models; hence facilitating semantic search, much effective reuse and semantic interoperability across different application domains. There are several important efforts addressing the semantic interoperability in healthcare domain such as IHE DEX profile proposal, CDISC SHARE and CDISC2RDF. Our architecture complements these by providing a framework to interlink existing data element registries and repositories for multiplying their potential for semantic interoperability to a greater extent. Open source implementation of the federated semantic MDR framework presented in this paper is the core of the semantic interoperability layer of the SALUS project which enables the execution of the post marketing safety analysis studies on top of existing EHR systems. Copyright © 2013 Elsevier Inc. All rights reserved.
Deck, John; Gaither, Michelle R; Ewing, Rodney; Bird, Christopher E; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J; Crandall, Eric D
2017-08-01
The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.
Metadata management and semantics in microarray repositories.
Kocabaş, F; Can, T; Baykal, N
2011-12-01
The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.
Deck, John; Gaither, Michelle R.; Ewing, Rodney; Bird, Christopher E.; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J.; Crandall, Eric D.
2017-01-01
The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information’s (NCBI’s) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science. PMID:28771471
NASA Astrophysics Data System (ADS)
Zschocke, Thomas; Beniest, Jan
The Consultative Group on International Agricultural Re- search (CGIAR) has established a digital repository to share its teaching and learning resources along with descriptive educational information based on the IEEE Learning Object Metadata (LOM) standard. As a critical component of any digital repository, quality metadata are critical not only to enable users to find more easily the resources they require, but also for the operation and interoperability of the repository itself. Studies show that repositories have difficulties in obtaining good quality metadata from their contributors, especially when this process involves many different stakeholders as is the case with the CGIAR as an international organization. To address this issue the CGIAR began investigating the Open ECBCheck as well as the ISO/IEC 19796-1 standard to establish quality protocols for its training. The paper highlights the implications and challenges posed by strengthening the metadata creation workflow for disseminating learning objects of the CGIAR.
A metadata schema for data objects in clinical research.
Canham, Steve; Ohmann, Christian
2016-11-24
A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.
Lessons Learned From 104 Years of Mobile Observatories
NASA Astrophysics Data System (ADS)
Miller, S. P.; Clark, P. D.; Neiswender, C.; Raymond, L.; Rioux, M.; Norton, C.; Detrick, R.; Helly, J.; Sutton, D.; Weatherford, J.
2007-12-01
As the oceanographic community ventures into a new era of integrated observatories, it may be helpful to look back on the era of "mobile observatories" to see what Cyberinfrastructure lessons might be learned. For example, SIO has been operating research vessels for 104 years, supporting a wide range of disciplines: marine geology and geophysics, physical oceanography, geochemistry, biology, seismology, ecology, fisheries, and acoustics. In the last 6 years progress has been made with diverse data types, formats and media, resulting in a fully-searchable online SIOExplorer Digital Library of more than 800 cruises (http://SIOExplorer.ucsd.edu). Public access to SIOExplorer is considerable, with 795,351 files (206 GB) downloaded last year. During the last 3 years the efforts have been extended to WHOI, with a "Multi-Institution Testbed for Scalable Digital Archiving" funded by the Library of Congress and NSF (IIS 0455998). The project has created a prototype digital library of data from both institutions, including cruises, Alvin submersible dives, and ROVs. In the process, the team encountered technical and cultural issues that will be facing the observatory community in the near future. Technological Lessons Learned: Shipboard data from multiple institutions are extraordinarily diverse, and provide a good training ground for observatories. Data are gathered from a wide range of authorities, laboratories, servers and media, with little documentation. Conflicting versions exist, generated by alternative processes. Domain- and institution-specific issues were addressed during initial staging. Data files were categorized and metadata harvested with automated procedures. With our second-generation approach to staging, we achieve higher levels of automation with greater use of controlled vocabularies. Database and XML- based procedures deal with the diversity of raw metadata values and map them to agreed-upon standard values, in collaboration with the Marine Metadata Interoperability (MMI) community. All objects are tagged with an expert level, thus serving an educational audience, as well as research users. After staging, publication into the digital library is completely automated. The technical challenges have been largely overcome, thanks to a scalable, federated digital library architecture from the San Diego Supercomputer Center, implemented at SIO, WHOI and other sites. The metadata design is flexible, supporting modular blocks of metadata tailored to the needs of instruments, samples, documents, derived products, cruises or dives, as appropriate. Controlled metadata vocabularies, with content and definitions negotiated by all parties, are critical. Metadata may be mapped to required external standards and formats, as needed. Cultural Lessons Learned: The cultural challenges have been more formidable than expected. They became most apparent during attempts to categorize and stage digital data objects across two institutions, each with their own naming conventions and practices, generally undocumented, and evolving across decades. Whether the questions concerned data ownership, collection techniques, data diversity or institutional practices, the solution involved a joint discussion with scientists, data managers, technicians and archivists, working together. Because metadata discussions go on endlessly, significant benefit comes from dictionaries with definitions of all community-authorized metadata values.
A Lifecycle Approach to Brokered Data Management for Hydrologic Modeling Data Using Open Standards.
NASA Astrophysics Data System (ADS)
Blodgett, D. L.; Booth, N.; Kunicki, T.; Walker, J.
2012-12-01
The U.S. Geological Survey Center for Integrated Data Analytics has formalized an information management-architecture to facilitate hydrologic modeling and subsequent decision support throughout a project's lifecycle. The architecture is based on open standards and open source software to decrease the adoption barrier and to build on existing, community supported software. The components of this system have been developed and evaluated to support data management activities of the interagency Great Lakes Restoration Initiative, Department of Interior's Climate Science Centers and WaterSmart National Water Census. Much of the research and development of this system has been in cooperation with international interoperability experiments conducted within the Open Geospatial Consortium. Community-developed standards and software, implemented to meet the unique requirements of specific disciplines, are used as a system of interoperable, discipline specific, data types and interfaces. This approach has allowed adoption of existing software that satisfies the majority of system requirements. Four major features of the system include: 1) assistance in model parameter and forcing creation from large enterprise data sources; 2) conversion of model results and calibrated parameters to standard formats, making them available via standard web services; 3) tracking a model's processes, inputs, and outputs as a cohesive metadata record, allowing provenance tracking via reference to web services; and 4) generalized decision support tools which rely on a suite of standard data types and interfaces, rather than particular manually curated model-derived datasets. Recent progress made in data and web service standards related to sensor and/or model derived station time series, dynamic web processing, and metadata management are central to this system's function and will be presented briefly along with a functional overview of the applications that make up the system. As the separate pieces of this system progress, they will be combined and generalized to form a sort of social network for nationally consistent hydrologic modeling.
Marine Profiles for OGC Sensor Web Enablement Standards
NASA Astrophysics Data System (ADS)
Jirka, Simon
2016-04-01
The use of OGC Sensor Web Enablement (SWE) standards in oceanology is increasing. Several projects are developing SWE-based infrastructures to ease the sharing of marine sensor data. This work ranges from developments on sensor level to efforts addressing interoperability of data flows between observatories and organisations. The broad range of activities using SWE standards leads to a risk of diverging approaches how the SWE specifications are applied. Because the SWE standards are designed in a domain independent manner, they intentionally offer a high degree of flexibility enabling implementation across different domains and usage scenarios. At the same time this flexibility allows one to achieve similar goals in different ways. To avoid interoperability issues, an agreement is needed on how to apply SWE concepts and how to use vocabularies in a common way that will be shared by different projects, implementations, and users. To address this need, partners from several projects and initiatives (AODN, BRIDGES, envri+, EUROFLEETS/EUROFLEETS2, FixO3, FRAM, IOOS, Jerico/Jerico-Next, NeXOS, ODIP/ODIP II, RITMARE, SeaDataNet, SenseOcean, X-DOMES) have teamed up to develop marine profiles of OGC SWE standards that can serve as a common basis for developments in multiple projects and organisations. The following aspects will be especially considered: 1.) Provision of metadata: For discovering sensors/instruments as well as observation data, to facilitate the interpretation of observations, and to integrate instruments in sensor platforms, the provision of metadata is crucial. Thus, a marine profile of the OGC Sensor Model Language 2.0 (SensorML 2.0) will be developed allowing to provide metadata for different levels (e.g. observatory, instrument, and detector) and sensor types. The latter will enable metadata of a specific type to be automatically inherited by all devices/sensors of the same type. The application of further standards such as OGC PUCK will benefit from this encoding, too, by facilitating the communication with instruments. 2.) Encoding and modelling of observation data: For delivering observation data, the ISO/OGC Observations and Measurements 2.0 (O&M 2.0) standard serves as a good basis. Within an O&M profile, recommendations will be given on needed observation types that cover different aspects of marine sensing (trajectory, stationary, or profile measurements, etc.). Besides XML, further O&M encodings (e.g. JSON-based) will be considered. 3.) Data access: A profile of the OGC Sensor Observation Service 2.0 (SOS 2.0) standard will be specified to offer a common way on how this web service interface can be used for requesting marine observations and metadata. At the same time this will offer a common interface to cross-domain applications based upon tools such as the GEOSS DAB. Lightweight approaches such as REST will be considered as further bindings for the SOS interface. 4.) Backward compatibility: The profile will consider the existing observation systems so that migration paths towards the specified profiles can be offered. We will present the current state of the profile development. In particular, a comparative analysis of SWE usage in different projects, an outline of the requirements, and fundamental aspects of profiles of SWE standards will be shown.
Federal Register 2010, 2011, 2012, 2013, 2014
2011-11-21
... development of standardized metadata in hundreds of organizations, and funded numerous implementations of OGC... of emphasis include: Metadata documentation, clearinghouse establishment, framework development...
OSCAR/Surface: Metadata for the WMO Integrated Observing System WIGOS
NASA Astrophysics Data System (ADS)
Klausen, Jörg; Pröscholdt, Timo; Mannes, Jürg; Cappelletti, Lucia; Grüter, Estelle; Calpini, Bertrand; Zhang, Wenjian
2016-04-01
The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority underpinning all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). It does this by better integrating WMO and co-sponsored observing systems, as well as partner networks. For this, an important aspect is the description of the observational capabilities by way of structured metadata. The 17th Congress of the Word Meteorological Organization (Cg-17) has endorsed the semantic WIGOS metadata standard (WMDS) developed by the Task Team on WIGOS Metadata (TT-WMD). The standard comprises of a set of metadata classes that are considered to be of critical importance for the interpretation of observations and the evolution of observing systems relevant to WIGOS. The WMDS serves all recognized WMO Application Areas, and its use for all internationally exchanged observational data generated by WMO Members is mandatory. The standard will be introduced in three phases between 2016 and 2020. The Observing Systems Capability Analysis and Review (OSCAR) platform operated by MeteoSwiss on behalf of WMO is the official repository of WIGOS metadata and an implementation of the WMDS. OSCAR/Surface deals with all surface-based observations from land, air and oceans, combining metadata managed by a number of complementary, more domain-specific systems (e.g., GAWSIS for the Global Atmosphere Watch, JCOMMOPS for the marine domain, the WMO Radar database). It is a modern, web-based client-server application with extended information search, filtering and mapping capabilities including a fully developed management console to add and edit observational metadata. In addition, a powerful application programming interface (API) is being developed to allow machine-to-machine metadata exchange. The API is based on an ISO/OGC-compliant XML schema for the WMDS using the Observations and Measurements (ISO19156) conceptual model. The purpose of the presentation is to acquaint the audience with OSCAR, the WMDS and the current XML schema; and, to explore the relationship to the INSPIRE XML schema. Feedback from experts in the various disciplines of meteorology, climatology, atmospheric chemistry, hydrology on the utility of the new standard and the XML schema will be solicited and will guide WMO in further evolving the WMDS.
The Service Environment for Enhanced Knowledge and Research (SEEKR) Framework
NASA Astrophysics Data System (ADS)
King, T. A.; Walker, R. J.; Weigel, R. S.; Narock, T. W.; McGuire, R. E.; Candey, R. M.
2011-12-01
The Service Environment for Enhanced Knowledge and Research (SEEKR) Framework is a configurable service oriented framework to enable the discovery, access and analysis of data shared in a community. The SEEKR framework integrates many existing independent services through the use of web technologies and standard metadata. Services are hosted on systems by using an application server and are callable by using REpresentational State Transfer (REST) protocols. Messages and metadata are transferred with eXtensible Markup Language (XML) encoding which conform to a published XML schema. Space Physics Archive Search and Extract (SPASE) metadata is central to utilizing the services. Resources (data, documents, software, etc.) are described with SPASE and the associated Resource Identifier is used to access and exchange resources. The configurable options for the service can be set by using a web interface. Services are packaged as web application resource (WAR) files for direct deployment on application services such as Tomcat or Jetty. We discuss the composition of the SEEKR framework, how new services can be integrated and the steps necessary to deploying the framework. The SEEKR Framework emerged from NASA's Virtual Magnetospheric Observatory (VMO) and other systems and we present an overview of these systems from a SEEKR Framework perspective.
Using URIs to effectively transmit sensor data and metadata
NASA Astrophysics Data System (ADS)
Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise; Gardner, Thomas
2017-04-01
Autonomous ocean observation is massively increasing the number of sensors in the ocean. Accordingly, the continuing increase in datasets produced, makes selecting sensors that are fit for purpose a growing challenge. Decision making on selecting quality sensor data, is based on the sensor's metadata, i.e. manufacturer specifications, history of calibrations etc. The Open Geospatial Consortium (OGC) has developed the Sensor Web Enablement (SWE) standards to facilitate integration and interoperability of sensor data and metadata. The World Wide Web Consortium (W3C) Semantic Web technologies enable machine comprehensibility promoting sophisticated linking and processing of data published on the web. Linking the sensor's data and metadata according to the above-mentioned standards can yield practical difficulties, because of internal hardware bandwidth restrictions and a requirement to constrain data transmission costs. Our approach addresses these practical difficulties by uniquely identifying sensor and platform models and instances through URIs, which resolve via content negotiation to either OGC's sensor meta language, sensorML or W3C's Linked Data. Data transmitted by a sensor incorporate the sensor's unique URI to refer to its metadata. Sensor and platform model URIs and descriptions are created and hosted by the British Oceanographic Data Centre (BODC) linked systems service. The sensor owner creates the sensor and platform instance URIs prior and during sensor deployment, through an updatable web form, the Sensor Instance Form (SIF). SIF enables model and instance URI association but also platform and sensor linking. The use of URIs, which are dynamically generated through the SIF, offers both practical and economical benefits to the implementation of SWE and Linked Data standards in near real time systems. Data can be linked to metadata dynamically in-situ while saving on the costs associated to the transmission of long metadata descriptions. The transmission of short URIs also enables the implementation of standards on systems where it is impractical, such as legacy hardware.
Costs and Benefits of Mission Participation in PDS4 Migrations
NASA Astrophysics Data System (ADS)
Mafi, J. N.; King, T. A.; Cecconi, B.; Faden, J.; Piker, C.; Kazden, D. P.; Gordon, M. K.; Joy, S. P.
2017-12-01
The Planetary Data System, Version 4 (PDS4) Standard, was a major reworking of the previous, PDS3 standard. According to PDS policy, "NASA missions confirmed for flight after [1 November 2011 were] required to archive their data according to PDS4 standards." Accordingly, NASA missions starting with LADEE (launched September 2013), and MAVEN (launched November 2013) have used the PDS4 standard. However, a large legacy of previously archived NASA planetary mission data already reside in the PDS archive in PDS3 and older formats. Plans to migrate the existing PDS archives to PDS4 have been discussed within PDS for some time, and have been reemphasized in the PDS Roadmap Study for 2017 - 2026 (https://pds.nasa.gov/roadmap/PlanetaryDataSystemRMS17-26_20jun17.pdf). Updating older PDS metadata to PDS4 would enable those data to take advantage of new capabilities offered by PDS4, and insure the full compatibility of past archives with current and future PDS4 tools and services. Responsibility for performing the migration to PDS4 falls primarily upon the PDS discipline nodes, though some support by the active (or recently active) instrument teams would be required in order to help augment the existing metadata to include information that is unique to PDS4. However, there may be some value in mission data providers becoming more actively involved in the migration process. The upfront costs of this approach may be offset by the long term benefits of data provider's understanding of PDS4, their ability to take more full advantage of PDS4 tools and services, and in their preparation for producing PDS4 archives for future missions. This presentation will explore the costs and benefits associated with this approach.
Improving Metadata Compliance for Earth Science Data Records
NASA Astrophysics Data System (ADS)
Armstrong, E. M.; Chang, O.; Foster, D.
2014-12-01
One of the recurring challenges of creating earth science data records is to ensure a consistent level of metadata compliance at the granule level where important details of contents, provenance, producer, and data references are necessary to obtain a sufficient level of understanding. These details are important not just for individual data consumers but also for autonomous software systems. Two of the most popular metadata standards at the granule level are the Climate and Forecast (CF) Metadata Conventions and the Attribute Conventions for Dataset Discovery (ACDD). Many data producers have implemented one or both of these models including the Group for High Resolution Sea Surface Temperature (GHRSST) for their global SST products and the Ocean Biology Processing Group for NASA ocean color and SST products. While both the CF and ACDD models contain various level of metadata richness, the actual "required" attributes are quite small in number. Metadata at the granule level becomes much more useful when recommended or optional attributes are implemented that document spatial and temporal ranges, lineage and provenance, sources, keywords, and references etc. In this presentation we report on a new open source tool to check the compliance of netCDF and HDF5 granules to the CF and ACCD metadata models. The tool, written in Python, was originally implemented to support metadata compliance for netCDF records as part of the NOAA's Integrated Ocean Observing System. It outputs standardized scoring for metadata compliance for both CF and ACDD, produces an objective summary weight, and can be implemented for remote records via OPeNDAP calls. Originally a command-line tool, we have extended it to provide a user-friendly web interface. Reports on metadata testing are grouped in hierarchies that make it easier to track flaws and inconsistencies in the record. We have also extended it to support explicit metadata structures and semantic syntax for the GHRSST project that can be easily adapted to other satellite missions as well. Overall, we hope this tool will provide the community with a useful mechanism to improve metadata quality and consistency at the granule level by providing objective scoring and assessment, as well as encourage data producers to improve metadata quality and quantity.
NASA Astrophysics Data System (ADS)
Peckham, S. D.; Kelbert, A.; Rudan, S.; Stoica, M.
2016-12-01
Standardized metadata for models is the key to reliable and greatly simplified coupling in model coupling frameworks like CSDMS (Community Surface Dynamics Modeling System). This model metadata also helps model users to understand the important details that underpin computational models and to compare the capabilities of different models. These details include simplifying assumptions on the physics, governing equations and the numerical methods used to solve them, discretization of space (the grid) and time (the time-stepping scheme), state variables (input or output), model configuration parameters. This kind of metadata provides a "deep description" of a computational model that goes well beyond other types of metadata (e.g. author, purpose, scientific domain, programming language, digital rights, provenance, execution) and captures the science that underpins a model. While having this kind of standardized metadata for each model in a repository opens up a wide range of exciting possibilities, it is difficult to collect this information and a carefully conceived "data model" or schema is needed to store it. Automated harvesting and scraping methods can provide some useful information, but they often result in metadata that is inaccurate or incomplete, and this is not sufficient to enable the desired capabilities. In order to address this problem, we have developed a browser-based tool called the MCM Tool (Model Component Metadata) which runs on notebooks, tablets and smart phones. This tool was partially inspired by the TurboTax software, which greatly simplifies the necessary task of preparing tax documents. It allows a model developer or advanced user to provide a standardized, deep description of a computational geoscience model, including hydrologic models. Under the hood, the tool uses a new ontology for models built on the CSDMS Standard Names, expressed as a collection of RDF files (Resource Description Framework). This ontology is based on core concepts such as variables, objects, quantities, operations, processes and assumptions. The purpose of this talk is to present details of the new ontology and to then demonstrate the MCM Tool for several hydrologic models.
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.
Making Interoperability Easier with the NASA Metadata Management Tool
NASA Astrophysics Data System (ADS)
Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.
2016-12-01
ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.
Evaluating and Evolving Metadata in Multiple Dialects
NASA Astrophysics Data System (ADS)
Kozimor, J.; Habermann, T.; Powers, L. A.; Gordon, S.
2016-12-01
Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways. This situation has led to the development of metadata repositories that can ingest and output metadata in multiple dialects. As an operational example, the NASA Common Metadata Repository (CMR) includes three different metadata dialects (DIF, ECHO, and ISO 19115-2). These systems raise a new question for metadata providers: if I have a choice of metadata dialects, which should I use and how do I make that decision? We have developed a collection of metadata evaluation tools that can be used to evaluate metadata records in many dialects for completeness with respect to recommendations from many organizations and communities. We have applied these tools to over 8000 collection and granule metadata records in four different dialects. This large collection of identical content in multiple dialects enables us to address questions about metadata and dialect evolution and to answer those questions quantitatively. We will describe those tools and results from evaluating the NASA CMR metadata collection.
Explorative Analyses of Nursing Research Data.
Kim, Hyeoneui; Jang, Imho; Quach, Jimmy; Richardson, Alex; Kim, Jaemin; Choi, Jeeyae
2016-10-26
As a first step of pursuing the vision of "big data science in nursing," we described the characteristics of nursing research data reported in 194 published nursing studies. We also explored how completely the Version 1 metadata specification of biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) represents these metadata. The metadata items of the nursing studies were all related to one or more of the bioCADDIE metadata entities. However, values of many metadata items of the nursing studies were not sufficiently represented through the bioCADDIE metadata. This was partly due to the differences in the scope of the content that the bioCADDIE metadata are designed to represent. The 194 nursing studies reported a total of 1,181 unique data items, the majority of which take non-numeric values. This indicates the importance of data standardization to enable the integrative analyses of these data to support big data science in nursing. © The Author(s) 2016.
A Model for Enhancing Internet Medical Document Retrieval with “Medical Core Metadata”
Malet, Gary; Munoz, Felix; Appleyard, Richard; Hersh, William
1999-01-01
Objective: Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. Design: The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and Medline-type content descriptions. Results: The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. Conclusions: The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. PMID:10094069
NASA Astrophysics Data System (ADS)
Mencin, David; Hodgkinson, Kathleen; Sievers, Charlie; David, Phillips; Charles, Meertens; Glen, Mattioli
2017-04-01
UNAVCO has been providing infrastructure and support for solid-earth sciences and earthquake natural hazards for the past two decades. Recent advances in GNSS technology and data processing are now providing position solutions with centimeter-level precision at high-rate (>1 Hz) and low latency (i.e. the time required for data to arrive for analysis, in this case less than 1 second). These data have the potential to improve our understanding in diverse areas of geophysics including properties of seismic, volcanic, magmatic and tsunami sources, and thus profoundly transform rapid event characterization and warning. Scientific and operational applications also include glacier and ice sheet motions; tropospheric modeling; and space weather. These areas of geophysics represent a spectrum of research fields, including geodesy, seismology, tropospheric weather, space weather and natural hazards. Processed Real-Time GNSS (RT-GNSS) data will require formats and standards that allow this broad and diverse community to use these data and associated meta-data in existing research infrastructure. These advances have critically highlighted the difficulties associated with merging data and metadata between scientific disciplines. Even seemingly very closely related fields such as geodesy and seismology, which both have rich histories of handling large volumes of data and metadata, do not go together well in any automated way. Community analysis strategies, or lack thereof, such as treatment of error prove difficult to address and are reflected in the data and metadata. In addition, these communities have differing security, accessibility and reliability requirements. We propose some solutions to the particular problem of making RT-GNSS processed solution data and metadata accessible to multiply scientific and natural hazard communities. Importantly, we discuss the roadblocks encounter and solved and those that remain to be addressed.
Legacy2Drupal: Conversion of an existing relational oceanographic database to a Drupal 7 CMS
NASA Astrophysics Data System (ADS)
Work, T. T.; Maffei, A. R.; Chandler, C. L.; Groman, R. C.
2011-12-01
Content Management Systems (CMSs) such as Drupal provide powerful features that can be of use to oceanographic (and other geo-science) data managers. However, in many instances, geo-science data management offices have already designed and implemented customized schemas for their metadata. The NSF funded Biological Chemical and Biological Data Management Office (BCO-DMO) has ported an existing relational database containing oceanographic metadata, along with an existing interface coded in Cold Fusion middleware, to a Drupal 7 Content Management System. This is an update on an effort described as a proof-of-concept in poster IN21B-1051, presented at AGU2009. The BCO-DMO project has translated all the existing database tables, input forms, website reports, and other features present in the existing system into Drupal CMS features. The replacement features are made possible by the use of Drupal content types, CCK node-reference fields, a custom theme, and a number of other supporting modules. This presentation describes the process used to migrate content in the original BCO-DMO metadata database to Drupal 7, some problems encountered during migration, and the modules used to migrate the content successfully. Strategic use of Drupal 7 CMS features that enable three separate but complementary interfaces to provide access to oceanographic research metadata will also be covered: 1) a Drupal 7-powered user front-end; 2) REST-ful JSON web services (providing a Mapserver interface to the metadata and data; and 3) a SPARQL interface to a semantic representation of the repository metadata (this feeding a new faceted search capability currently under development). The existing BCO-DMO ontology, developed in collaboration with Rensselaer Polytechnic Institute's Tetherless World Constellation, makes strategic use of pre-existing ontologies and will be used to drive semantically-enabled faceted search capabilities planned for the site. At this point, the use of semantic technologies included in the Drupal 7 core is anticipated. Using a public domain CMS as opposed to proprietary middleware, and taking advantage of the many features of Drupal 7 that are designed to support semantically-enabled interfaces will help prepare the BCO-DMO and other science data repositories for interoperability between systems that serve ecosystem research data.
Increasing the international visibility of research data by a joint metadata schema
NASA Astrophysics Data System (ADS)
Svoboda, Nikolai; Zoarder, Muquit; Gärtner, Philipp; Hoffmann, Carsten; Heinrich, Uwe
2017-04-01
The BonaRes Project ("Soil as a sustainable resource for the bioeconomy") was launched in 2015 to promote sustainable soil management and to avoid fragmentation of efforts (Wollschläger et al., 2016). For this purpose, an IT infrastructure is being developed to upload, manage, store, and provide research data and its associated metadata. The research data provided by the BonaRes data centre are, in principle, not subject to any restrictions on reuse. For all research data considerable standardized metadata are the key enablers for the effective use of these data. Providing proper metadata is often viewed as an extra burden with further work and resources consumed. In our lecture we underline the benefits of structured and interoperable metadata like: accessibility of data, discovery of data, interpretation of data, linking data and several more and we counter these advantages with the effort of time, personnel and further costs. Building on this, we describe the framework of metadata in BonaRes combining the standards of OGC for description, visualization, exchange and discovery of geodata as well as the schema of DataCite for the publication and citation of this research data. This enables the generation of a DOI, a unique identifier that provides a permanent link to the citable research data. By using OGC standards, data and metadata become interoperable with numerous research data provided via INSPIRE. It enables further services like CSW for harvesting WMS for visualization and WFS for downloading. We explain the mandatory fields that result from our approach and we give a general overview about our metadata architecture implementation. Literature: Wollschläger, U; Helming, K.; Heinrich, U.; Bartke, S.; Kögel-Knabner, I.; Russell, D.; Eberhardt, E. & Vogel, H.-J.: The BonaRes Centre - A virtual institute for soil research in the context of a sustainable bio-economy. Geophysical Research Abstracts, Vol. 18, EGU2016-9087, 2016.
NASA Astrophysics Data System (ADS)
Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian
2016-04-01
The CLIPC project is developing a portal to provide a single point of access for scientific information on climate change. This is made possible through the Copernicus Earth Observation Programme for Europe, which will deliver a new generation of environmental measurements of climate quality. The data about the physical environment which is used to inform climate change policy and adaptation measures comes from several categories: satellite measurements, terrestrial observing systems, model projections and simulations and from re-analyses (syntheses of all available observations constrained with numerical weather prediction systems). These data categories are managed by different communities: CLIPC will provide a single point of access for the whole range of data. The CLIPC portal will provide a number of indicators showing impacts on specific sectors which have been generated using a range of factors selected through structured expert consultation. It will also, as part of the transformation services, allow users to explore the consequences of using different combinations of driving factors which they consider to be of particular relevance to their work or life. The portal will provide information on the scientific quality and pitfalls of such transformations to prevent misleading usage of the results. The CLIPC project will develop an end to end processing chain (indicator tool kit), from comprehensive information on the climate state through to highly aggregated decision relevant products. Indicators of climate change and climate change impact will be provided, and a tool kit to update and post process the collection of indicators will be integrated into the portal. The CLIPC portal has a distributed architecture, making use of OGC services provided by e.g., climate4impact.eu and CEDA. CLIPC has two themes: 1. Harmonized access to climate datasets derived from models, observations and re-analyses 2. A climate impact tool kit to evaluate, rank and aggregate indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.
Adapting the CUAHSI Hydrologic Information System to OGC standards
NASA Astrophysics Data System (ADS)
Valentine, D. W.; Whitenack, T.; Zaslavsky, I.
2010-12-01
The CUAHSI Hydrologic Information System (HIS) provides web and desktop client access to hydrologic observations via water data web services using an XML schema called “WaterML”. The WaterML 1.x specification and the corresponding Water Data Services have been the backbone of the HIS service-oriented architecture (SOA) and have been adopted for serving hydrologic data by several federal agencies and many academic groups. The central discovery service, HIS Central, is based on an metadata catalog that references 4.7 billion observations, organized as 23 million data series from 1.5 million sites from 51 organizations. Observations data are published using HydroServer nodes that have been deployed at 18 organizations. Usage of HIS has increased by 8x from 2008 to 2010, and doubled in usage from 1600 data series a day in 2009 to 3600 data series a day in the first half of 2010. The HIS central metadata catalog currently harvests information from 56 Water Data Services. We collaborate on the catalog updates with two federal partners, USGS and US EPA: their data series are periodically reloaded into the HIS metadata catalog. We are pursuing two main development directions in the HIS project: Cloud-based computing, and further compliance with Open Geospatial Consortium (OGC) standards. The goal of moving to cloud-computing is to provide a scalable collaborative system with a simpler deployment and less dependence of hardware maintenance and staff. This move requires re-architecting the information models underlying the metadata catalog, and Water Data Services to be independent of the underlying relational database model, allowing for implementation on both relational databases, and cloud-based processing systems. Cloud-based HIS central resources can be managed collaboratively; partners share responsibility for their metadata by publishing data series information into the centralized catalog. Publishing data series will use REST-based service interfaces, like OData, as the basis for ingesting data series information into a cloud-hosted catalog. The future HIS services involve providing information via OGC Standards that will allow for observational data access from commercial GIS applications. Use of standards will allow for tools to access observational data from other projects using standards, such as the Ocean Observatories Initiative, and for tools from such projects to be integrated into the HIS toolset. With international collaborators, we have been developing a water information exchange language called “WaterML 2.0” which will be used to deliver observations data over OGC Sensor Observation Services (SOS). A software stack of OGC standard services will provide access to HIS information. In addition to SOS, Web Mapping and Feature Services (WMS, and WFS) will provide access to location information. Catalog Services for the Web (CSW) will provide a catalog for water information that is both centralized, and distributed. We intend the OGC standards supplement the existing HIS service interfaces, rather than replace the present service interfaces. The ultimate goal of this development is expand access to hydrologic observations data, and create an environment where these data can be seamlessly integrated with standards-compliant data resources.
NASA Astrophysics Data System (ADS)
Leadbetter, Adam; Arko, Robert; Chandler, Cynthia; Shepherd, Adam
2014-05-01
"Linked Data" is a term used in Computer Science to encapsulate a methodology for publishing data and metadata in a structured format so that links may be created and exploited between objects. Berners-Lee (2006) outlines the following four design principles of a Linked Data system: Use Uniform Resource Identifiers (URIs) as names for things. Use HyperText Transfer Protocol (HTTP) URIs so that people can look up those names. When someone looks up a URI, provide useful information, using the standards (Resource Description Framework [RDF] and the RDF query language [SPARQL]). Include links to other URIs so that they can discover more things. In 2010, Berners-Lee revisited his original design plan for Linked Data to encourage data owners along a path to "good Linked Data". This revision involved the creation of a five star rating system for Linked Data outlined below. One star: Available on the web (in any format). Two stars: Available as machine-readable structured data (e.g. An Excel spreadsheet instead of an image scan of a table). Three stars: As two stars plus the use of a non-proprietary format (e.g. Comma Separated Values instead of Excel). Four stars: As three stars plus the use of open standards from the World Wide Web Commission (W3C) (i.e. RDF and SPARQL) to identify things, so that people can point to your data and metadata. Five stars: All the above plus link your data to other people's data to provide context Here we present work building on the SeaDataNet common vocabularies served by the NERC Vocabulary Server, connecting projects such as the Rolling Deck to Repository (R2R) and the Biological and Chemical Oceanography Data Management Office (BCO-DMO) and other vocabularies such as the Marine Metadata Interoperability Ontology Register and Repository and the NASA Global Change Master Directory to create a Linked Ocean Data cloud. Publishing the vocabularies and metadata in standard RDF XML and exposing SPARQL endpoints renders them five-star Linked Data repositories. The benefits of this approach include: increased interoperability between the metadata created by projects; improved data discovery as users of SeaDataNet, R2R and BCO-DMO terms can find data using labels with which they are familiar both standard tools and newly developed custom tools may be used to explore the data; and using standards means the custom tools are easier to develop Linked Data is a concept which has been in existence for nearly a decade, and has a simple set of formal best practices associated with it. Linked Data is increasingly being seen as a driver of the next generation of "community science" activities. While many data providers in the oceanographic domain may be unaware of Linked Data, they may also be providing it at one of its lower levels. Here we have shown that it is possible to deliver the highest standard of Linked Oceanographic Data, and some of the benefits of the approach.
ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability
NASA Astrophysics Data System (ADS)
Trapasso, T. J.
2017-12-01
With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).
Metazen – metadata capture for metagenomes
2014-01-01
Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508
Metazen - metadata capture for metagenomes.
Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker
2014-01-01
As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.
Development of Health Information Search Engine Based on Metadata and Ontology
Song, Tae-Min; Jin, Dal-Lae
2014-01-01
Objectives The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Methods Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Conclusions Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers. PMID:24872907
Development of health information search engine based on metadata and ontology.
Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae
2014-04-01
The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
The Water SWITCH-ON Spatial Information Platform (SIP)
NASA Astrophysics Data System (ADS)
Sala Calero, J., Sr.; Boot, G., Sr.; Dihé, P., Sr.; Arheimer, B.
2017-12-01
The amount of hydrological open data is continually growing and providing opportunities to the scientific community. Although the existing data portals (GEOSS Portal, INSPIRE community geoportal and others) enable access to open data, many users still find browsing through them difficult. Moreover, the time spent on gathering and preparing data usually is more significant than the time spent on the experiment itself. Thus, any improvement on searching, understanding, accessing or using open data is greatly beneficial. The Spatial Information Platform (SIP) has been developed to tackle these issues within the SWITCH-ON European Commission funded FP7 project. The SIP has been designed as a set of tools based on open standards that provide to the user all the necessary functionalities as described in the Publish-Find-Bind (PFB) pattern. In other words, this means that the SIP helps users to locate relevant and suitable data for their experiments analysis, to access and transform it (filtering, extraction, selection, conversion, aggregation). Moreover, the SIP can be used to provide descriptive information about the data and to publish it so others can find and use it. The SIP is based on existing open data protocols such as the OGC/CSW, OGC/WMS, OpenDAP and open-source components like PostgreSQL/PostGIS, GeoServer and pyCSW. The SIP is divided in three main user interfaces: the BYOD (Browse your open dataset) web interface, the Expert GUI tool and the Upload Data and Metadata web interface. The BYOD HTML5 client is the main entry point for users that want to browse through open data in the SIP. The BYOD has a map interface based on Leaflet JavaScript libraries so that the users can search more efficiently. The web-based Open Data Registration Tool is a user-friendly upload and metadata description interface (geographical extent, license, DOI generation). The Expert GUI is a desktop application that provides full metadata editing capabilities for the metadata moderators of the project. In conclusion, the Spatial Information Platform (SIP) provides to its community a set of tools for better understanding and ease of use of hydrological open-data. Moreover, the SIP has been based on well-known OGC standards that will allow the connection and data harvesting from popular open data portals such as the GEOSS system of systems.
Assessing Public Metabolomics Metadata, Towards Improving Quality.
Ferreira, João D; Inácio, Bruno; Salek, Reza M; Couto, Francisco M
2017-12-13
Public resources need to be appropriately annotated with metadata in order to make them discoverable, reproducible and traceable, further enabling them to be interoperable or integrated with other datasets. While data-sharing policies exist to promote the annotation process by data owners, these guidelines are still largely ignored. In this manuscript, we analyse automatic measures of metadata quality, and suggest their application as a mean to encourage data owners to increase the metadata quality of their resources and submissions, thereby contributing to higher quality data, improved data sharing, and the overall accountability of scientific publications. We analyse these metadata quality measures in the context of a real-world repository of metabolomics data (i.e. MetaboLights), including a manual validation of the measures, and an analysis of their evolution over time. Our findings suggest that the proposed measures can be used to mimic a manual assessment of metadata quality.
Metadata to Describe Genomic Information.
Delgado, Jaime; Naro, Daniel; Llorente, Silvia; Gelpí, Josep Lluís; Royo, Romina
2018-01-01
Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new format for efficient and secure compressed storage and transmission of genomic information.
Schlue, Danijela; Mate, Sebastian; Haier, Jörg; Kadioglu, Dennis; Prokosch, Hans-Ulrich; Breil, Bernhard
2017-01-01
Heterogeneous tumor documentation and its challenges of interpretation of medical terms lead to problems in analyses of data from clinical and epidemiological cancer registries. The objective of this project was to design, implement and improve a national content delivery portal for oncological terms. Data elements of existing handbooks and documentation sources were analyzed, combined and summarized by medical experts of different comprehensive cancer centers. Informatics experts created a generic data model based on an existing metadata repository. In order to establish a national knowledge management system for standardized cancer documentation, a prototypical tumor wiki was designed and implemented. Requirements engineering techniques were applied to optimize this platform. It is targeted to user groups such as documentation officers, physicians and patients. The linkage to other information sources like PubMed and MeSH was realized.
In-field Access to Geoscientific Metadata through GPS-enabled Mobile Phones
NASA Astrophysics Data System (ADS)
Hobona, Gobe; Jackson, Mike; Jordan, Colm; Butchart, Ben
2010-05-01
Fieldwork is an integral part of much geosciences research. But whilst geoscientists have physical or online access to data collections whilst in the laboratory or at base stations, equivalent in-field access is not standard or straightforward. The increasing availability of mobile internet and GPS-supported mobile phones, however, now provides the basis for addressing this issue. The SPACER project was commissioned by the Rapid Innovation initiative of the UK Joint Information Systems Committee (JISC) to explore the potential for GPS-enabled mobile phones to access geoscientific metadata collections. Metadata collections within the geosciences and the wider geospatial domain can be disseminated through web services based on the Catalogue Service for Web(CSW) standard of the Open Geospatial Consortium (OGC) - a global grouping of over 380 private, public and academic organisations aiming to improve interoperability between geospatial technologies. CSW offers an XML-over-HTTP interface for querying and retrieval of geospatial metadata. By default, the metadata returned by CSW is based on the ISO19115 standard and encoded in XML conformant to ISO19139. The SPACER project has created a prototype application that enables mobile phones to send queries to CSW containing user-defined keywords and coordinates acquired from GPS devices built-into the phones. The prototype has been developed using the free and open source Google Android platform. The mobile application offers views for listing titles, presenting multiple metadata elements and a Google Map with an overlay of bounding coordinates of datasets. The presentation will describe the architecture and approach applied in the development of the prototype.
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Richard, S. M.; Malik, T.; Hsu, L.; Gupta, A.; Grethe, J. S.; Valentine, D. W., Jr.; Lehnert, K. A.; Bermudez, L. E.; Ozyurt, I. B.; Whitenack, T.; Schachne, A.; Giliarini, A.
2015-12-01
While many geoscience-related repositories and data discovery portals exist, finding information about available resources remains a pervasive problem, especially when searching across multiple domains and catalogs. Inconsistent and incomplete metadata descriptions, disparate access protocols and semantic differences across domains, and troves of unstructured or poorly structured information which is hard to discover and use are major hindrances toward discovery, while metadata compilation and curation remain manual and time-consuming. We report on methodology, main results and lessons learned from an ongoing effort to develop a geoscience-wide catalog of information resources, with consistent metadata descriptions, traceable provenance, and automated metadata enhancement. Developing such a catalog is the central goal of CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability), an EarthCube building block project (earthcube.org/group/cinergi). The key novel technical contributions of the projects include: a) development of a metadata enhancement pipeline and a set of document enhancers to automatically improve various aspects of metadata descriptions, including keyword assignment and definition of spatial extents; b) Community Resource Viewers: online applications for crowdsourcing community resource registry development, curation and search, and channeling metadata to the unified CINERGI inventory, c) metadata provenance, validation and annotation services, d) user interfaces for advanced resource discovery; and e) geoscience-wide ontology and machine learning to support automated semantic tagging and faceted search across domains. We demonstrate these CINERGI components in three types of user scenarios: (1) improving existing metadata descriptions maintained by government and academic data facilities, (2) supporting work of several EarthCube Research Coordination Network projects in assembling information resources for their domains, and (3) enhancing the inventory and the underlying ontology to address several complicated data discovery use cases in hydrology, geochemistry, sedimentology, and critical zone science. Support from the US National Science Foundation under award ICER-1343816 is gratefully acknowledged.
The minimum information about a genome sequence (MIGS) specification
Field, Dawn; Garrity, George; Gray, Tanya; Morrison, Norman; Selengut, Jeremy; Sterk, Peter; Tatusova, Tatiana; Thomson, Nicholas; Allen, Michael J; Angiuoli, Samuel V; Ashburner, Michael; Axelrod, Nelson; Baldauf, Sandra; Ballard, Stuart; Boore, Jeffrey; Cochrane, Guy; Cole, James; Dawyndt, Peter; De Vos, Paul; dePamphilis, Claude; Edwards, Robert; Faruque, Nadeem; Feldman, Robert; Gilbert, Jack; Gilna, Paul; Glöckner, Frank Oliver; Goldstein, Philip; Guralnick, Robert; Haft, Dan; Hancock, David; Hermjakob, Henning; Hertz-Fowler, Christiane; Hugenholtz, Phil; Joint, Ian; Kagan, Leonid; Kane, Matthew; Kennedy, Jessie; Kowalchuk, George; Kottmann, Renzo; Kolker, Eugene; Kravitz, Saul; Kyrpides, Nikos; Leebens-Mack, Jim; Lewis, Suzanna E; Li, Kelvin; Lister, Allyson L; Lord, Phillip; Maltsev, Natalia; Markowitz, Victor; Martiny, Jennifer; Methe, Barbara; Mizrachi, Ilene; Moxon, Richard; Nelson, Karen; Parkhill, Julian; Proctor, Lita; White, Owen; Sansone, Susanna-Assunta; Spiers, Andrew; Stevens, Robert; Swift, Paul; Taylor, Chris; Tateno, Yoshio; Tett, Adrian; Turner, Sarah; Ussery, David; Vaughan, Bob; Ward, Naomi; Whetzel, Trish; Gil, Ingio San; Wilson, Gareth; Wipat, Anil
2008-01-01
With the quantity of genomic data increasing at an exponential rate, it is imperative that these data be captured electronically, in a standard format. Standardization activities must proceed within the auspices of open-access and international working bodies. To tackle the issues surrounding the development of better descriptions of genomic investigations, we have formed the Genomic Standards Consortium (GSC). Here, we introduce the minimum information about a genome sequence (MIGS) specification with the intent of promoting participation in its development and discussing the resources that will be required to develop improved mechanisms of metadata capture and exchange. As part of its wider goals, the GSC also supports improving the ‘transparency’ of the information contained in existing genomic databases. PMID:18464787
OAI and NASA's Scientific and Technical Information.
ERIC Educational Resources Information Center
Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.
2003-01-01
Details NASA's (National Aeronautics & Space Administration (USA)) involvement in defining and testing the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH) and experience with adapting existing NASA distributed searching DLs (digital libraries) to use the OAI-PMH and metadata harvesting. Discusses some new digital…
Metadata tables to enable dynamic data modeling and web interface design: the SEER example.
Weiner, Mark; Sherr, Micah; Cohen, Abigail
2002-04-01
A wealth of information addressing health status, outcomes and resource utilization is compiled and made available by various government agencies. While exploration of the data is possible using existing tools, in general, would-be users of the resources must acquire CD-ROMs or download data from the web, and upload the data into their own database. Where web interfaces exist, they are highly structured, limiting the kinds of queries that can be executed. This work develops a web-based database interface engine whose content and structure is generated through interaction with a metadata table. The result is a dynamically generated web interface that can easily accommodate changes in the underlying data model by altering the metadata table, rather than requiring changes to the interface code. This paper discusses the background and implementation of the metadata table and web-based front end and provides examples of its use with the NCI's Surveillance, Epidemiology and End-Results (SEER) database.
Evaluating and Evolving Metadata in Multiple Dialects
NASA Technical Reports Server (NTRS)
Kozimore, John; Habermann, Ted; Gordon, Sean; Powers, Lindsay
2016-01-01
Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways.
Sea Level Station Metadata for Tsunami Detection, Warning and Research
NASA Astrophysics Data System (ADS)
Stroker, K. J.; Marra, J.; Kari, U. S.; Weinstein, S. A.; Kong, L.
2007-12-01
The devastating earthquake and tsunami of December 26, 2004 has greatly increased recognition of the need for water level data both from the coasts and the deep-ocean. In 2006, the National Oceanic and Atmospheric Administration (NOAA) completed a Tsunami Data Management Report describing the management of data required to minimize the impact of tsunamis in the United States. One of the major gaps defined in this report is the access to global coastal water level data. NOAA's National Geophysical Data Center (NGDC) and National Climatic Data Center (NCDC) are working cooperatively to bridge this gap. NOAA relies on a network of global data, acquired and processed in real-time to support tsunami detection and warning, as well as high-quality global databases of archived data to support research and advanced scientific modeling. In 2005, parties interested in enhancing the access and use of sea level station data united under the NOAA NCDC's Integrated Data and Environmental Applications (IDEA) Center's Pacific Region Integrated Data Enterprise (PRIDE) program to develop a distributed metadata system describing sea level stations (Kari et. al., 2006; Marra et.al., in press). This effort started with pilot activities in a regional framework and is targeted at tsunami detection and warning systems being developed by various agencies. It includes development of the components of a prototype sea level station metadata web service and accompanying Google Earth-based client application, which use an XML-based schema to expose, at a minimum, information in the NOAA National Weather Service (NWS) Pacific Tsunami Warning Center (PTWC) station database needed to use the PTWC's Tide Tool application. As identified in the Tsunami Data Management Report, the need also exists for long-term retention of the sea level station data. NOAA envisions that the retrospective water level data and metadata will also be available through web services, using an XML-based schema. Five high-priority metadata requirements identified at a water level workshop held at the XXIV IUGG Meeting in Perugia will be addressed: consistent, validated, and well defined numbers (e.g. amplitude); exact location of sea level stations; a complete record of sea level data stored in the archive; identifying high-priority sea level stations; and consistent definitions. NOAA's National Geophysical Data Center (NGDC) and co-located World Data Center for Solid Earth Geophysics (including tsunamis) would hold the archive of the sea level station data and distribute the standard metadata. Currently, NGDC is also archiving and distributing the DART buoy deep-ocean water level data and metadata in standards based formats. Kari, Uday S., John J. Marra, Stuart A. Weinstein, 2006 A Tsunami Focused Data Sharing Framework For Integration of Databases that Describe Water Level Station Specifications. AGU Fall Meeting, 2006. San Francisco, California. Marra, John, J., Uday S. Kari, and Stuart A. Weinstein (in press). A Tsunami Detection and Warning-focused Sea Level Station Metadata Web Service. IUGG XXIV, July 2-13, 2007. Perugia, Italy.
In Interactive, Web-Based Approach to Metadata Authoring
NASA Technical Reports Server (NTRS)
Pollack, Janine; Wharton, Stephen W. (Technical Monitor)
2001-01-01
NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools
An Assessment of the Need for Standard Variable Names for Airborne Field Campaigns
NASA Astrophysics Data System (ADS)
Beach, A. L., III; Chen, G.; Northup, E. A.; Kusterer, J.; Quam, B. M.
2017-12-01
The NASA Earth Venture Program has led to a dramatic increase in airborne observations, requiring updated data management practices with clearly defined data standards and protocols for metadata. An airborne field campaign can involve multiple aircraft and a variety of instruments. It is quite common to have different instruments/techniques measure the same parameter on one or more aircraft platforms. This creates a need to allow instrument Principal Investigators (PIs) to name their variables in a way that would distinguish them across various data sets. A lack of standardization of variables names presents a challenge for data search tools in enabling discovery of similar data across airborne studies, aircraft platforms, and instruments. This was also identified by data users as one of the top issues in data use. One effective approach for mitigating this problem is to enforce variable name standardization, which can effectively map the unique PI variable names to fixed standard names. In order to ensure consistency amongst the standard names, it will be necessary to choose them from a controlled list. However, no such list currently exists despite a number of previous efforts to establish a sufficient list of atmospheric variable names. The Atmospheric Composition Variable Standard Name Working Group was established under the auspices of NASA's Earth Science Data Systems Working Group (ESDSWG) to solicit research community feedback to create a list of standard names that are acceptable to data providers and data users This presentation will discuss the challenges and recommendations of standard variable names in an effort to demonstrate how airborne metadata curation/management can be improved to streamline data ingest, improve interoperability, and discoverability to a broader user community.
NASA Astrophysics Data System (ADS)
Hardy, D.; Janée, G.; Gallagher, J.; Frew, J.; Cornillon, P.
2006-12-01
The OPeNDAP Data Access Protocol (DAP) is a community standard for sharing scientific data across the Internet. Data providers using DAP have adopted a variety of metadata conventions to improve data utility, such as COARDS (1995) and CF (2003). Our results show, however, that metadata do not follow these conventions in practice. We collected metadata from over a hundred DAP servers, tens of thousands of data objects, and hundreds of collections. We found that a minority claim to adhere to a metadata convention, and a small percentage accurately adhere to their stated convention. We present descriptive statistics of our survey and highlight common traits such as well-populated attributes. Our empirical results indicate that unified search services cannot rely solely on metadata conventions. Although we encourage all providers to adopt a small subset of the CF convention for discovery purposes, we have no evidence to suggest that improved conventions would simplify the fundamental problem of heterogeneity. Large-scale discovery services must find methods for integrating incompatible metadata.
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.
Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu
2015-01-01
Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
EMERALD: A Flexible Framework for Managing Seismic Data
NASA Astrophysics Data System (ADS)
West, J. D.; Fouch, M. J.; Arrowsmith, R.
2010-12-01
The seismological community is challenged by the vast quantity of new broadband seismic data provided by large-scale seismic arrays such as EarthScope’s USArray. While this bonanza of new data enables transformative scientific studies of the Earth’s interior, it also illuminates limitations in the methods used to prepare and preprocess those data. At a recent seismic data processing focus group workshop, many participants expressed the need for better systems to minimize the time and tedium spent on data preparation in order to increase the efficiency of scientific research. Another challenge related to data from all large-scale transportable seismic experiments is that there currently exists no system for discovering and tracking changes in station metadata. This critical information, such as station location, sensor orientation, instrument response, and clock timing data, may change over the life of an experiment and/or be subject to post-experiment correction. Yet nearly all researchers utilize metadata acquired with the downloaded data, even though subsequent metadata updates might alter or invalidate results produced with older metadata. A third long-standing issue for the seismic community is the lack of easily exchangeable seismic processing codes. This problem stems directly from the storage of seismic data as individual time series files, and the history of each researcher developing his or her preferred data file naming convention and directory organization. Because most processing codes rely on the underlying data organization structure, such codes are not easily exchanged between investigators. To address these issues, we are developing EMERALD (Explore, Manage, Edit, Reduce, & Analyze Large Datasets). The goal of the EMERALD project is to provide seismic researchers with a unified, user-friendly, extensible system for managing seismic event data, thereby increasing the efficiency of scientific enquiry. EMERALD stores seismic data and metadata in a state-of-the-art open source relational database (PostgreSQL), and can, on a timed basis or on demand, download the most recent metadata, compare it with previously acquired values, and alert the user to changes. The backend relational database is capable of easily storing and managing many millions of records. The extensible, plug-in architecture of the EMERALD system allows any researcher to contribute new visualization and processing methods written in any of 12 programming languages, and a central Internet-enabled repository for such methods provides users with the opportunity to download, use, and modify new processing methods on demand. EMERALD includes data acquisition tools allowing direct importation of seismic data, and also imports data from a number of existing seismic file formats. Pre-processed clean sets of data can be exported as standard sac files with user-defined file naming and directory organization, for use with existing processing codes. The EMERALD system incorporates existing acquisition and processing tools, including SOD, TauP, GMT, and FISSURES/DHI, making much of the functionality of those tools available in a unified system with a user-friendly web browser interface. EMERALD is now in beta test. See emerald.asu.edu or contact john.d.west@asu.edu for more details.
Technical Challenges of Enterprise Imaging: HIMSS-SIIM Collaborative White Paper.
Clunie, David A; Dennison, Don K; Cram, Dawn; Persons, Kenneth R; Bronkalla, Mark D; Primo, Henri Rik
2016-10-01
This white paper explores the technical challenges and solutions for acquiring (capturing) and managing enterprise images, particularly those involving visible light applications. The types of acquisition devices used for various general-purpose photography and specialized applications including dermatology, endoscopy, and anatomic pathology are reviewed. The formats and standards used, and the associated metadata requirements and communication protocols for transfer and workflow are considered. Particular emphasis is placed on the importance of metadata capture in both order- and encounter-based workflow. The benefits of using DICOM to provide a standard means of recording and accessing both metadata and image and video data are considered, as is the role of IHE and FHIR.
NASA Astrophysics Data System (ADS)
Peckham, S. D.
2017-12-01
Standardized, deep descriptions of digital resources (e.g. data sets, computational models, software tools and publications) make it possible to develop user-friendly software systems that assist scientists with the discovery and appropriate use of these resources. Semantic metadata makes it possible for machines to take actions on behalf of humans, such as automatically identifying the resources needed to solve a given problem, retrieving them and then automatically connecting them (despite their heterogeneity) into a functioning workflow. Standardized model metadata also helps model users to understand the important details that underpin computational models and to compare the capabilities of different models. These details include simplifying assumptions on the physics, governing equations and the numerical methods used to solve them, discretization of space (the grid) and time (the time-stepping scheme), state variables (input or output), model configuration parameters. This kind of metadata provides a "deep description" of a computational model that goes well beyond other types of metadata (e.g. author, purpose, scientific domain, programming language, digital rights, provenance, execution) and captures the science that underpins a model. A carefully constructed, unambiguous and rules-based schema to address this problem, called the Geoscience Standard Names ontology will be presented that utilizes Semantic Web best practices and technologies. It has also been designed to work across science domains and to be readable by both humans and machines.
Metazen – metadata capture for metagenomes
Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...
2014-12-08
Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Metazen – metadata capture for metagenomes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bischof, Jared; Harrison, Travis; Paczian, Tobias
Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins
NASA Astrophysics Data System (ADS)
Lambert, F.; Odier, J.; Fulachier, J.; ATLAS Collaboration
2017-10-01
The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.
A Linked Dataset of Medical Educational Resources
ERIC Educational Resources Information Center
Dietze, Stefan; Taibi, Davide; Yu, Hong Qing; Dovrolis, Nikolas
2015-01-01
Reusable educational resources became increasingly important for enhancing learning and teaching experiences, particularly in the medical domain where resources are particularly expensive to produce. While interoperability across educational resources metadata repositories is yet limited to the heterogeneity of metadata standards and interface…
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.
Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin
2018-02-01
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
Australia's TERN: Advancing Ecosystem Data Management in Australia
NASA Astrophysics Data System (ADS)
Phinn, S. R.; Christensen, R.; Guru, S.
2013-12-01
Globally, there is a consistent movement towards more open, collaborative and transparent science, where the publication and citation of data is considered standard practice. Australia's Terrestrial Ecosystem Research Network (TERN) is a national research infrastructure investment designed to support the ecosystem science community through all stages of the data lifecycle. TERN has developed and implemented a comprehensive network of ';hard' and ';soft' infrastructure that enables Australia's ecosystem scientists to collect, publish, store, share, discover and re-use data in ways not previously possible. The aim of this poster is to demonstrate how TERN has successfully delivered infrastructure that is enabling a significant cultural and practical shift in Australia's ecosystem science community towards consistent approaches for data collection, meta-data, data licensing, and data publishing. TERN enables multiple disciplines, within the ecosystem sciences to more effectively and efficiently collect, store and publish their data. A critical part of TERN's approach has been to build on existing data collection activities, networks and skilled people to enable further coordination and collaboration to build each data collection facility and coordinate data publishing. Data collection in TERN is through discipline based facilities, covering long term collection of: (1) systematic plot based measurements of vegetation structure, composition and faunal biodiversity; (2) instrumented towers making systematic measurements of solar, water and gas fluxes; and (3) satellite and airborne maps of biophysical properties of vegetation, soils and the atmosphere. Several other facilities collect and integrate environmental data to produce national products for fauna and vegetation surveys, soils and coastal data, as well as integrated or synthesised products for modelling applications. Data management, publishing and sharing in TERN are implemented through a tailored data licensing framework suitable for ecosystem data, national standards for metadata, a DOI-minting service, and context-appropriate data repositories and portals. The TERN Data infrastructure is based on loosely coupled 'network of networks.' Overall, the data formats used across the TERN facilities vary from NetCDF, comma-separated values and descriptive documents. Metadata standards include ISO19115, Ecological Metadata Language and rich semantic enabled contextual information. Data services vary from Web Mapping Service, Web Feature Service, OpeNDAP, file servers and KNB Metacat. These approaches enable each data collection facility to maintain their discipline based data collection and storage protocols. TERN facility meta-data are harvested regularly for the central TERN Data Discovery Portal and converted to a national standard format. This approach enables centralised discovery, access, and re-use of data simply and effectively, while maintaining disciplinary diversity. Effort is still required to support the cultural shift towards acceptance of effective data management, publication, sharing and re-use as standard practice. To this end TERN's future activities will be directed to supporting this transformation and undertaking ';education' to enable ecosystem scientists to take full advantage of TERN's infrastructure, and providing training and guidance for best practice data management.
FRAMES Metadata Reporting Templates for Ecohydrological Observations, version 1.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christianson, Danielle; Varadharajan, Charuleka; Christoffersen, Brad
FRAMES is a a set of Excel metadata files and package-level descriptive metadata that are designed to facilitate and improve capture of desired metadata for ecohydrological observations. The metadata are bundled with data files into a data package and submitted to a data repository (e.g. the NGEE Tropics Data Repository) via a web form. FRAMES standardizes reporting of diverse ecohydrological and biogeochemical data for synthesis across a range of spatiotemporal scales and incorporates many best data science practices. This version of FRAMES supports observations for primarily automated measurements collected by permanently located sensors, including sap flow (tree water use), leafmore » surface temperature, soil water content, dendrometry (stem diameter growth increment), and solar radiation. Version 1.1 extend the controlled vocabulary and incorporates functionality to facilitate programmatic use of data and FRAMES metadata (R code available at NGEE Tropics Data Repository).« less
Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System
NASA Astrophysics Data System (ADS)
Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh
2013-12-01
SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of methods including OAI-PMH [3]. The Mercury search interface then allows users to perform simple, fielded, spatial and temporal searches across a central harmonized index of metadata. Mercury supports various metadata standards including FGDC, ISO-19115, DIF, Dublin-Core, Darwin-Core, and EML. This poster describes in detail how Mercury implements the Unified Science Information Model for Soil moisture data. References: [1]Devarakonda R., et al. Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics (2010), 3(1): 87-94. [2]Devarakonda R., et al. Daymet: Single Pixel Data Extraction Tool. http://daymet.ornl.gov/singlepixel.html (2012). Last Accesses 10-01-2013 [3]Devarakonda R., et al. Data sharing and retrieval using OAI-PMH. Earth Science Informatics (2011), 4(1): 1-5.
The Global Genome Biodiversity Network (GGBN) Data Standard specification
Droege, G.; Barker, K.; Seberg, O.; Coddington, J.; Benson, E.; Berendsohn, W. G.; Bunk, B.; Butler, C.; Cawsey, E. M.; Deck, J.; Döring, M.; Flemons, P.; Gemeinholzer, B.; Güntsch, A.; Hollowell, T.; Kelbert, P.; Kostadinov, I.; Kottmann, R.; Lawlor, R. T.; Lyal, C.; Mackenzie-Dodds, J.; Meyer, C.; Mulcahy, D.; Nussbeck, S. Y.; O'Tuama, É.; Orrell, T.; Petersen, G.; Robertson, T.; Söhngen, C.; Whitacre, J.; Wieczorek, J.; Yilmaz, P.; Zetzsche, H.; Zhang, Y.; Zhou, X.
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today’s ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard. Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard PMID:27694206
Overview of FEED, the feeding experiments end-user database.
Wall, Christine E; Vinyard, Christopher J; Williams, Susan H; Gapeyev, Vladimir; Liu, Xianhua; Lapp, Hilmar; German, Rebecca Z
2011-08-01
The Feeding Experiments End-user Database (FEED) is a research tool developed by the Mammalian Feeding Working Group at the National Evolutionary Synthesis Center that permits synthetic, evolutionary analyses of the physiology of mammalian feeding. The tasks of the Working Group are to compile physiologic data sets into a uniform digital format stored at a central source, develop a standardized terminology for describing and organizing the data, and carry out a set of novel analyses using FEED. FEED contains raw physiologic data linked to extensive metadata. It serves as an archive for a large number of existing data sets and a repository for future data sets. The metadata are stored as text and images that describe experimental protocols, research subjects, and anatomical information. The metadata incorporate controlled vocabularies to allow consistent use of the terms used to describe and organize the physiologic data. The planned analyses address long-standing questions concerning the phylogenetic distribution of phenotypes involving muscle anatomy and feeding physiology among mammals, the presence and nature of motor pattern conservation in the mammalian feeding muscles, and the extent to which suckling constrains the evolution of feeding behavior in adult mammals. We expect FEED to be a growing digital archive that will facilitate new research into understanding the evolution of feeding anatomy.
CINERGI: Community Inventory of EarthCube Resources for Geoscience Interoperability
NASA Astrophysics Data System (ADS)
Zaslavsky, Ilya; Bermudez, Luis; Grethe, Jeffrey; Gupta, Amarnath; Hsu, Leslie; Lehnert, Kerstin; Malik, Tanu; Richard, Stephen; Valentine, David; Whitenack, Thomas
2014-05-01
Organizing geoscience data resources to support cross-disciplinary data discovery, interpretation, analysis and integration is challenging because of different information models, semantic frameworks, metadata profiles, catalogs, and services used in different geoscience domains, not to mention different research paradigms and methodologies. The central goal of CINERGI, a new project supported by the US National Science Foundation through its EarthCube Building Blocks program, is to create a methodology and assemble a large inventory of high-quality information resources capable of supporting data discovery needs of researchers in a wide range of geoscience domains. The key characteristics of the inventory are: 1) collaboration with and integration of metadata resources from a number of large data facilities; 2) reliance on international metadata and catalog service standards; 3) assessment of resource "interoperability-readiness"; 4) ability to cross-link and navigate data resources, projects, models, researcher directories, publications, usage information, etc.; 5) efficient inclusion of "long-tail" data, which are not appearing in existing domain repositories; 6) data registration at feature level where appropriate, in addition to common dataset-level registration, and 7) integration with parallel EarthCube efforts, in particular focused on EarthCube governance, information brokering, service-oriented architecture design and management of semantic information. We discuss challenges associated with accomplishing CINERGI goals, including defining the inventory scope; managing different granularity levels of resource registration; interaction with search systems of domain repositories; explicating domain semantics; metadata brokering, harvesting and pruning; managing provenance of the harvested metadata; and cross-linking resources based on the linked open data (LOD) approaches. At the higher level of the inventory, we register domain-wide resources such as domain catalogs, vocabularies, information models, data service specifications, identifier systems, and assess their conformance with international standards (such as those adopted by ISO and OGC, and used by INSPIRE) or de facto community standards using, in part, automatic validation techniques. The main level in CINERGI leverages a metadata aggregation platform (currently Geoportal Server) to organize harvested resources from multiple collections and contributed by community members during EarthCube end-user domain workshops or suggested online. The latter mechanism uses the SciCrunch toolkit originally developed within the Neuroscience Information Framework (NIF) project and now being extended to other communities. The inventory is designed to support requests such as "Find resources with theme X in geographic area S", "Find datasets with subject Y using query concept expansion", "Find geographic regions having data of type Z", "Find datasets that contain property P". With the added LOD support, additional types of requests, such as "Find example implementations of specification X", "Find researchers who have worked in Domain X, dataset Y, location L", "Find resources annotated by person X", will be supported. Project's website (http://workspace.earthcube.org/cinergi) provides access to the initial resource inventory, a gallery of EarthCube researchers, collections of geoscience models, metadata entry forms, and other software modules and inventories being integrated into the CINERGI system. Support from the US National Science Foundation under award NSF ICER-1343816 is gratefully acknowledged.
NASA Astrophysics Data System (ADS)
Oggioni, Alessandro; Tagliolato, Paolo; Fugazza, Cristiano; Bastianini, Mauro; Pavesi, Fabio; Pepe, Monica; Menegon, Stefano; Basoni, Anna; Carrara, Paola
2015-04-01
Sensor observation systems for environmental data have become increasingly important in the last years. The EGU's Informatics in Oceanography and Ocean Science track stressed the importance of management tools and solutions for marine infrastructures. We think that full interoperability among sensor systems is still an open issue and that the solution to this involves providing appropriate metadata. Several open source applications implement the SWE specification and, particularly, the Sensor Observation Services (SOS) standard. These applications allow for the exchange of data and metadata in XML format between computer systems. However, there is a lack of metadata editing tools supporting end users in this activity. Generally speaking, it is hard for users to provide sensor metadata in the SensorML format without dedicated tools. In particular, such a tool should ease metadata editing by providing, for standard sensors, all the invariant information to be included in sensor metadata, thus allowing the user to concentrate on the metadata items that are related to the specific deployment. RITMARE, the Italian flagship project on marine research, envisages a subproject, SP7, for the set-up of the project's spatial data infrastructure. SP7 developed EDI, a general purpose, template-driven metadata editor that is composed of a backend web service and an HTML5/javascript client. EDI can be customized for managing the creation of generic metadata encoded as XML. Once tailored to a specific metadata format, EDI presents the users a web form with advanced auto completion and validation capabilities. In the case of sensor metadata (SensorML versions 1.0.1 and 2.0), the EDI client is instructed to send an "insert sensor" request to an SOS endpoint in order to save the metadata in an SOS server. In the first phase of project RITMARE, EDI has been used to simplify the creation from scratch of SensorML metadata by the involved researchers and data managers. An interesting by-product of this ongoing work is currently constituting an archive of predefined sensor descriptions. This information is being collected in order to further ease metadata creation in the next phase of the project. Users will be able to choose among a number of sensor and sensor platform prototypes: These will be specific instances on which it will be possible to define, in a bottom-up approach, "sensor profiles". We report on the outcome of this activity.
Web Services as Building Blocks for an Open Coastal Observing System
NASA Astrophysics Data System (ADS)
Breitbach, G.; Krasemann, H.
2012-04-01
In coastal observing systems it is needed to integrate different observing methods like remote sensing, in-situ measurements, and models into a synoptic view of the state of the observed region. This integration can be based solely on web services combining data and metadata. Such an approach is pursued for COSYNA (Coastal Observing System for Northern and Artic seas). Data from satellite and radar remote sensing, measurements of buoys, stations and Ferryboxes are the observation part of COSYNA. These data are assimilated into models to create pre-operational forecasts. For discovering data an OGC Web Feature Service (WFS) is used by the COSYNA data portal. This Web Feature Service knows the necessary metadata not only for finding data, but in addition the URLs of web services to view and download the data. To make the data from different resources comparable a common vocabulary is needed. For COSYNA the standard names from CF-conventions are stored within the metadata whenever possible. For the metadata an INSPIRE and ISO19115 compatible data format is used. The WFS is fed from the metadata-system using database-views. Actual data are stored in two different formats, in NetCDF-files for gridded data and in an RDBMS for time-series-like data. The web service URLs are mostly standard based the standards are mainly OGC standards. Maps were created from netcdf files with the help of the ncWMS tool whereas a self-developed java servlet is used for maps of moving measurement platforms. In this case download of data is offered via OGC SOS. For NetCDF-files OPeNDAP is used for the data download. The OGC CSW is used for accessing extended metadata. The concept of data management in COSYNA will be presented which is independent of the special services used in COSYNA. This concept is parameter and data centric and might be useful for other observing systems.
Kolker, Eugene; Özdemir, Vural; Martens, Lennart; Hancock, William; Anderson, Gordon; Anderson, Nathaniel; Aynacioglu, Sukru; Baranova, Ancha; Campagna, Shawn R; Chen, Rui; Choiniere, John; Dearth, Stephen P; Feng, Wu-Chun; Ferguson, Lynnette; Fox, Geoffrey; Frishman, Dmitrij; Grossman, Robert; Heath, Allison; Higdon, Roger; Hutz, Mara H; Janko, Imre; Jiang, Lihua; Joshi, Sanjay; Kel, Alexander; Kemnitz, Joseph W; Kohane, Isaac S; Kolker, Natali; Lancet, Doron; Lee, Elaine; Li, Weizhong; Lisitsa, Andrey; Llerena, Adrian; Macnealy-Koch, Courtney; Marshall, Jean-Claude; Masuzzo, Paola; May, Amanda; Mias, George; Monroe, Matthew; Montague, Elizabeth; Mooney, Sean; Nesvizhskii, Alexey; Noronha, Santosh; Omenn, Gilbert; Rajasimha, Harsha; Ramamoorthy, Preveen; Sheehan, Jerry; Smarr, Larry; Smith, Charles V; Smith, Todd; Snyder, Michael; Rapole, Srikanth; Srivastava, Sanjeeva; Stanberry, Larissa; Stewart, Elizabeth; Toppo, Stefano; Uetz, Peter; Verheggen, Kenneth; Voy, Brynn H; Warnich, Louise; Wilhelm, Steven W; Yandl, Gregory
2014-01-01
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
Kolker, Eugene; Özdemir, Vural; Martens, Lennart; Hancock, William; Anderson, Gordon; Anderson, Nathaniel; Aynacioglu, Sukru; Baranova, Ancha; Campagna, Shawn R; Chen, Rui; Choiniere, John; Dearth, Stephen P; Feng, Wu-Chun; Ferguson, Lynnette; Fox, Geoffrey; Frishman, Dmitrij; Grossman, Robert; Heath, Allison; Higdon, Roger; Hutz, Mara H; Janko, Imre; Jiang, Lihua; Joshi, Sanjay; Kel, Alexander; Kemnitz, Joseph W; Kohane, Isaac S; Kolker, Natali; Lancet, Doron; Lee, Elaine; Li, Weizhong; Lisitsa, Andrey; Llerena, Adrian; MacNealy-Koch, Courtney; Marshall, Jean-Claude; Masuzzo, Paola; May, Amanda; Mias, George; Monroe, Matthew; Montague, Elizabeth; Mooney, Sean; Nesvizhskii, Alexey; Noronha, Santosh; Omenn, Gilbert; Rajasimha, Harsha; Ramamoorthy, Preveen; Sheehan, Jerry; Smarr, Larry; Smith, Charles V; Smith, Todd; Snyder, Michael; Rapole, Srikanth; Srivastava, Sanjeeva; Stanberry, Larissa; Stewart, Elizabeth; Toppo, Stefano; Uetz, Peter; Verheggen, Kenneth; Voy, Brynn H; Warnich, Louise; Wilhelm, Steven W; Yandl, Gregory
2013-12-01
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.
ALE: automated label extraction from GEO metadata.
Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D
2017-12-28
NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.
New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and ARM
NASA Astrophysics Data System (ADS)
Crow, M. C.; Devarakonda, R.; Killeffer, T.; Hook, L.; Boden, T.; Wullschleger, S.
2017-12-01
Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This poster describes tools being used in several projects at Oak Ridge National Laboratory (ORNL), with a focus on the U.S. Department of Energy's Next Generation Ecosystem Experiment in the Arctic (NGEE Arctic) and Atmospheric Radiation Measurements (ARM) project, and their usage at different stages of the data lifecycle. The Online Metadata Editor (OME) is used for the documentation and archival stages while a Data Search tool supports indexing, cataloging, and searching. The NGEE Arctic OME Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload while adhering to standard metadata formats. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The Data Search Tool conveniently displays each data record in a thumbnail containing the title, source, and date range, and features a quick view of the metadata associated with that record, as well as a direct link to the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for geo-searching. These tools are supported by the Mercury [2] consortium (funded by DOE, NASA, USGS, and ARM) and developed and managed at Oak Ridge National Laboratory. Mercury is a set of tools for collecting, searching, and retrieving metadata and data. Mercury collects metadata from contributing project servers, then indexes the metadata to make it searchable using Apache Solr, and provides access to retrieve it from the web page. Metadata standards that Mercury supports include: XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115.
An XML-based system for the flexible classification and retrieval of clinical practice guidelines.
Ganslandt, T.; Mueller, M. L.; Krieglstein, C. F.; Senninger, N.; Prokosch, H. U.
2002-01-01
Beneficial effects of clinical practice guidelines (CPGs) have not yet reached expectations due to limited routine adoption. Electronic distribution and reminder systems have the potential to overcome implementation barriers. Existing electronic CPG repositories like the National Guideline Clearinghouse (NGC) provide individual access but lack standardized computer-readable interfaces necessary for automated guideline retrieval. The aim of this paper was to facilitate automated context-based selection and presentation of CPGs. Using attributes from the NGC classification scheme, an XML-based metadata repository was successfully implemented, providing document storage, classification and retrieval functionality. Semi-automated extraction of attributes was implemented for the import of XML guideline documents using XPath. A hospital information system interface was exemplarily implemented for diagnosis-based guideline invocation. Limitations of the implemented system are discussed and possible future work is outlined. Integration of standardized computer-readable search interfaces into existing CPG repositories is proposed. PMID:12463831
A multi-service data management platform for scientific oceanographic products
NASA Astrophysics Data System (ADS)
D'Anca, Alessandro; Conte, Laura; Nassisi, Paola; Palazzo, Cosimo; Lecci, Rita; Cretì, Sergio; Mancini, Marco; Nuzzo, Alessandra; Mirto, Maria; Mannarini, Gianandrea; Coppini, Giovanni; Fiore, Sandro; Aloisio, Giovanni
2017-02-01
An efficient, secure and interoperable data platform solution has been developed in the TESSA project to provide fast navigation and access to the data stored in the data archive, as well as a standard-based metadata management support. The platform mainly targets scientific users and the situational sea awareness high-level services such as the decision support systems (DSS). These datasets are accessible through the following three main components: the Data Access Service (DAS), the Metadata Service and the Complex Data Analysis Module (CDAM). The DAS allows access to data stored in the archive by providing interfaces for different protocols and services for downloading, variables selection, data subsetting or map generation. Metadata Service is the heart of the information system of the TESSA products and completes the overall infrastructure for data and metadata management. This component enables data search and discovery and addresses interoperability by exploiting widely adopted standards for geospatial data. Finally, the CDAM represents the back-end of the TESSA DSS by performing on-demand complex data analysis tasks.
Page, Roderic D M
2011-05-23
The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org/.
Design and implementation of a fault-tolerant and dynamic metadata database for clinical trials
NASA Astrophysics Data System (ADS)
Lee, J.; Zhou, Z.; Talini, E.; Documet, J.; Liu, B.
2007-03-01
In recent imaging-based clinical trials, quantitative image analysis (QIA) and computer-aided diagnosis (CAD) methods are increasing in productivity due to higher resolution imaging capabilities. A radiology core doing clinical trials have been analyzing more treatment methods and there is a growing quantity of metadata that need to be stored and managed. These radiology centers are also collaborating with many off-site imaging field sites and need a way to communicate metadata between one another in a secure infrastructure. Our solution is to implement a data storage grid with a fault-tolerant and dynamic metadata database design to unify metadata from different clinical trial experiments and field sites. Although metadata from images follow the DICOM standard, clinical trials also produce metadata specific to regions-of-interest and quantitative image analysis. We have implemented a data access and integration (DAI) server layer where multiple field sites can access multiple metadata databases in the data grid through a single web-based grid service. The centralization of metadata database management simplifies the task of adding new databases into the grid and also decreases the risk of configuration errors seen in peer-to-peer grids. In this paper, we address the design and implementation of a data grid metadata storage that has fault-tolerance and dynamic integration for imaging-based clinical trials.
Representations of time coordinates in FITS. Time and relative dimension in space
NASA Astrophysics Data System (ADS)
Rots, Arnold H.; Bunclark, Peter S.; Calabretta, Mark R.; Allen, Steven L.; Manchester, Richard N.; Thompson, William T.
2015-02-01
Context. In a series of three previous papers, formulation and specifics of the representation of world coordinate transformations in FITS data have been presented. This fourth paper deals with encoding time. Aims: Time on all scales and precisions known in astronomical datasets is to be described in an unambiguous, complete, and self-consistent manner. Methods: Employing the well-established World Coordinate System (WCS) framework, and maintaining compatibility with the FITS conventions that are currently in use to specify time, the standard is extended to describe rigorously the time coordinate. Results: World coordinate functions are defined for temporal axes sampled linearly and as specified by a lookup table. The resulting standard is consistent with the existing FITS WCS standards and specifies a metadata set that achieves the aims enunciated above.
Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).
Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier
2017-08-01
A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
García-Floriano, Andrés; Ferreira-Santiago, Angel; Yáñez-Márquez, Cornelio; Camacho-Nieto, Oscar; Aldape-Pérez, Mario; Villuendas-Rey, Yenny
2017-01-01
Social networking potentially offers improved distance learning environments by enabling the exchange of resources between learners. The existence of properly classified content results in an enhanced distance learning experience in which appropriate materials can be retrieved efficiently; however, for this to happen, metadata needs to be present.…
The Application and Future Direction of the SPASE Metadata Standard in the U.S. and Worldwide
NASA Astrophysics Data System (ADS)
King, Todd; Thieman, James; Roberts, D. Aaron
2013-04-01
The Space Physics Archive Search and Extract (SPASE) Metadata standard for Heliophysics and related data is now an established standard within the NASA-funded space and solar physics community and is spreading to the international groups within that community. Development of SPASE had involved a number of international partners and the current version of the SPASE Metadata Model (version 2.2.2) has been stable since January 2011. The SPASE standard has been adopted by groups such as NASA's Heliophysics division, the Canadian Space Science Data Portal (CSSDP), Canada's AUTUMN network, Japan's Inter-university Upper atmosphere Global Observation NETwork (IUGONET), Centre de Données de la Physique des Plasmas (CDPP), and the near-Earth space data infrastructure for e-Science (ESPAS). In addition, portions of the SPASE dictionary have been modeled in semantic web ontologies for use with reasoners and semantic searches. In development are modifications to accommodate simulation and model data, as well as enhancements to describe data accessibility. These additions will add features to describe a broader range of data types. In keeping with a SPASE principle of back-compatibility, these changes will not affect the data descriptions already generated for instrument-related datasets. We also look at the long term commitment by NASA to support the SPASE effort and how SPASE metadata can enable value-added services.
Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)
NASA Technical Reports Server (NTRS)
Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn
2009-01-01
During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either directly online or through a brokered order to the data archive organization. ECHO stores metadata from a variety of science disciplines and domains, including Climate Variability and Change, Carbon Cycle and Ecosystems, Earth Surface and Interior, Atmospheric Composition, Weather, and Water and Energy Cycle. ECHO also has a services registry for community-developed search services and data services. ECHO provides a platform for the publication, discovery, understanding and access to NASA s Earth Observation resources (data, service and clients). In their native state, these data, service and client resources are not necessarily targeted for use beyond their original mission. However, with the proper interoperability mechanisms, users of these resources can expand their value, by accessing, combining and applying them in unforeseen ways.
EPOS Data and Service Provision
NASA Astrophysics Data System (ADS)
Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt
2017-04-01
EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data collection into the EPOS metadata catalog. In parallel work proceeds on the user interface softsare, the APIs (Application Programming Interfaces) to the TCS services, the harvesting method and software, the AAAI (Authentication, Authorisation, Accounting Infrastructure) and the system manager. The next steps will involve interfaces to ICS-D (Distributed ICS i.e. facilities and services for computing, data storage, detectors and instruments for data collection etc.) to which requests, software and data will be deployed and from which data will be generated. Associated with this will be the development of the workflow system which will assist the end-user in building a workflow to achieve the scientific objectives.
Serious Games for Health: The Potential of Metadata.
Göbel, Stefan; Maddison, Ralph
2017-02-01
Numerous serious games and health games exist, either as commercial products (typically with a focus on entertaining a broad user group) or smaller games and game prototypes, often resulting from research projects (typically tailored to a smaller user group with a specific health characteristic). A major drawback of existing health games is that they are not very well described and attributed with (machine-readable, quantitative, and qualitative) metadata such as the characterizing goal of the game, the target user group, or expected health effects well proven in scientific studies. This makes it difficult or even impossible for end users to find and select the most appropriate game for a specific situation (e.g., health needs). Therefore, the aim of this article was to motivate the need and potential/benefit of metadata for the description and retrieval of health games and to describe a descriptive model for the qualitative description of games for health. It was not the aim of the article to describe a stable, running system (portal) for health games. This will be addressed in future work. Building on previous work toward a metadata format for serious games, a descriptive model for the formal description of games for health is introduced. For the conceptualization of this model, classification schemata of different existing health game repositories are considered. The classification schema consists of three levels: a core set of mandatory descriptive fields relevant for all games for health application areas, a detailed level with more comprehensive, optional information about the games, and so-called extension as level three with specific descriptive elements relevant for dedicated health games application areas, for example, cardio training. A metadata format provides a technical framework to describe, find, and select appropriate health games matching the needs of the end user. Future steps to improve, apply, and promote the metadata format in the health games market are discussed.
GeoViQua: quality-aware geospatial data discovery and evaluation
NASA Astrophysics Data System (ADS)
Bigagli, L.; Papeschi, F.; Mazzetti, P.; Nativi, S.
2012-04-01
GeoViQua (QUAlity aware VIsualization for the Global Earth Observation System of Systems) is a recently started FP7 project aiming at complementing the Global Earth Observation System of Systems (GEOSS) with rigorous data quality specifications and quality-aware capabilities, in order to improve reliability in scientific studies and policy decision-making. GeoViQua main scientific and technical objective is to enhance the GEOSS Common Infrastructure (GCI) providing the user community with innovative quality-aware search and evaluation tools, which will be integrated in the GEO-Portal, as well as made available to other end-user interfaces. To this end, GeoViQua will promote the extension of the current standard metadata for geographic information with accurate and expressive quality indicators, also contributing to the definition of a quality label (GEOLabel). GeoViQua proposed solutions will be assessed in several pilot case studies covering the whole Earth Observation chain, from remote sensing acquisition to data processing, to applications in the main GEOSS Societal Benefit Areas. This work presents the preliminary results of GeoViQua Work Package 4 "Enhanced geo-search tools" (WP4), started in January 2012. Its major anticipated technical innovations are search and evaluation tools that communicate and exploit data quality information from the GCI. In particular, GeoViQua will investigate a graphical search interface featuring a coherent and meaningful aggregation of statistics and metadata summaries (e.g. in the form of tables, charts), thus enabling end users to leverage quality constraints for data discovery and evaluation. Preparatory work on WP4 requirements indicated that users need the "best" data for their purpose, implying a high degree of subjectivity in judgment. This suggests that the GeoViQua system should exploit a combination of provider-generated metadata (objective indicators such as summary statistics), system-generated metadata (contextual/tracking information such as provenance of data and metadata), and user-generated metadata (informal user comments, usage information, rating, etc.). Moreover, metadata should include sufficiently complete access information, to allow rich data visualization and propagation. The following main enabling components are currently identified within WP4: - Quality-aware access services, e.g. a quality-aware extension of the OGC Sensor Observation Service (SOS-Q) specification, to support quality constraints for sensor data publishing and access; - Quality-aware discovery services, namely a quality-aware extension of the OGC Catalog Service for the Web (CSW-Q), to cope with quality constrained search; - Quality-augmentation broker (GeoViQua Broker), to support the linking and combination of the existing GCI metadata with GeoViQua- and user-generated metadata required to support the users in selecting the "best" data for their intended use. We are currently developing prototypes of the above quality-enabled geo-search components, that will be assessed in a sensor-based pilot case study in the next months. In particular, the GeoViQua Broker will be integrated with the EuroGEOSS Broker, to implement CSW-Q and federate (either via distribution or harvesting schemes) quality-aware data sources, GeoViQua will constitute a valuable test-bed for advancing the current best practices and standards in geospatial quality representation and exploitation. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 265178.
The center for expanded data annotation and retrieval
Bean, Carol A; Cheung, Kei-Hoi; Dumontier, Michel; Durante, Kim A; Gevaert, Olivier; Gonzalez-Beltran, Alejandra; Khatri, Purvesh; Kleinstein, Steven H; O’Connor, Martin J; Pouliot, Yannick; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Wiser, Jeffrey A
2015-01-01
The Center for Expanded Data Annotation and Retrieval is studying the creation of comprehensive and expressive metadata for biomedical datasets to facilitate data discovery, data interpretation, and data reuse. We take advantage of emerging community-based standard templates for describing different kinds of biomedical datasets, and we investigate the use of computational techniques to help investigators to assemble templates and to fill in their values. We are creating a repository of metadata from which we plan to identify metadata patterns that will drive predictive data entry when filling in metadata templates. The metadata repository not only will capture annotations specified when experimental datasets are initially created, but also will incorporate links to the published literature, including secondary analyses and possible refinements or retractions of experimental interpretations. By working initially with the Human Immunology Project Consortium and the developers of the ImmPort data repository, we are developing and evaluating an end-to-end solution to the problems of metadata authoring and management that will generalize to other data-management environments. PMID:26112029
Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin
2017-01-01
Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.
Metadata Sets for e-Government Resources: The Extended e-Government Metadata Schema (eGMS+)
NASA Astrophysics Data System (ADS)
Charalabidis, Yannis; Lampathaki, Fenareti; Askounis, Dimitris
In the dawn of the Semantic Web era, metadata appear as a key enabler that assists management of the e-Government resources related to the provision of personalized, efficient and proactive services oriented towards the real citizens’ needs. As different authorities typically use different terms to describe their resources and publish them in various e-Government Registries that may enhance the access to and delivery of governmental knowledge, but also need to communicate seamlessly at a national and pan-European level, the need for a unified e-Government metadata standard emerges. This paper presents the creation of an ontology-based extended metadata set for e-Government Resources that embraces services, documents, XML Schemas, code lists, public bodies and information systems. Such a metadata set formalizes the exchange of information between portals and registries and assists the service transformation and simplification efforts, while it can be further taken into consideration when applying Web 2.0 techniques in e-Government.
[Radiological dose and metadata management].
Walz, M; Kolodziej, M; Madsack, B
2016-12-01
This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented.
Dynamic federations: storage aggregation using open tools and protocols
NASA Astrophysics Data System (ADS)
Furano, Fabrizio; Brito da Rocha, Ricardo; Devresse, Adrien; Keeble, Oliver; Álvarez Ayllón, Alejandro; Fuhrmann, Patrick
2012-12-01
A number of storage elements now offer standard protocol interfaces like NFS 4.1/pNFS and WebDAV, for access to their data repositories, in line with the standardization effort of the European Middleware Initiative (EMI). Also the LCG FileCatalogue (LFC) can offer such features. Here we report on work that seeks to exploit the federation potential of these protocols and build a system that offers a unique view of the storage and metadata ensemble and the possibility of integration of other compatible resources such as those from cloud providers. The challenge, here undertaken by the providers of dCache and DPM, and pragmatically open to other Grid and Cloud storage solutions, is to build such a system while being able to accommodate name translations from existing catalogues (e.g. LFCs), experiment-based metadata catalogues, or stateless algorithmic name translations, also known as “trivial file catalogues”. Such so-called storage federations of standard protocols-based storage elements give a unique view of their content, thus promoting simplicity in accessing the data they contain and offering new possibilities for resilience and data placement strategies. The goal is to consider HTTP and NFS4.1-based storage elements and metadata catalogues and make them able to cooperate through an architecture that properly feeds the redirection mechanisms that they are based upon, thus giving the functionalities of a “loosely coupled” storage federation. One of the key requirements is to use standard clients (provided by OS'es or open source distributions, e.g. Web browsers) to access an already aggregated system; this approach is quite different from aggregating the repositories at the client side through some wrapper API, like for instance GFAL, or by developing new custom clients. Other technical challenges that will determine the success of this initiative include performance, latency and scalability, and the ability to create worldwide storage federations that are able to redirect clients to repositories that they can efficiently access, for instance trying to choose the endpoints that are closer or applying other criteria. We believe that the features of a loosely coupled federation of open-protocols-based storage elements will open many possibilities of evolving the current computing models without disrupting them, and, at the same time, will be able to operate with the existing infrastructures, follow their evolution path and add storage centers that can be acquired as a third-party service.
Information Architecture for Interactive Archives at the Community Coordianted Modeling Center
NASA Astrophysics Data System (ADS)
De Zeeuw, D.; Wiegand, C.; Kuznetsova, M.; Mullinix, R.; Boblitt, J. M.
2017-12-01
The Community Coordinated Modeling Center (CCMC) is upgrading its meta-data system for model simulations to be compliant with the SPASE meta-data standard. This work is helping to enhance the SPASE standards for simulations to better describe the wide variety of models and their output. It will enable much more sophisticated and automated metrics and validation efforts at the CCMC, as well as much more robust searches for specific types of output. The new meta-data will also allow much more tailored run submissions as it will allow some code options to be selected for Run-On-Request models. We will also demonstrate data accessibility through an implementation of the Heliophysics Application Programmer's Interface (HAPI) protocol of data otherwise available throught the integrated space weather analysis system (iSWA).
Serving Fisheries and Ocean Metadata to Communities Around the World
NASA Technical Reports Server (NTRS)
Meaux, Melanie
2006-01-01
NASA's Global Change Master Directory (GCMD) assists the oceanographic community in the discovery, access, and sharing of scientific data by serving on-line fisheries and ocean metadata to users around the globe. As of January 2006, the directory holds more than 16,300 Earth Science data descriptions and over 1,300 services descriptions. Of these, nearly 4,000 unique ocean-related metadata records are available to the public, with many having direct links to the data. In 2005, the GCMD averaged over 5 million hits a month, with nearly a half million unique hosts for the year. Through the GCMD portal (http://qcrnd.nasa.qov/), users can search vast and growing quantities of data and services using controlled keywords, free-text searches or a combination of both. Users may now refine a search based on topic, location, instrument, platform, project, data center, spatial and temporal coverage. The directory also offers data holders a means to post and search their data through customized portals, i.e. online customized subset metadata directories. The discovery metadata standard used is the Directory Interchange Format (DIF), adopted in 1994. This format has evolved to accommodate other national and international standards such as FGDC and IS019115. Users can submit metadata through easy-to-use online and offline authoring tools. The directory, which also serves as a coordinating node of the International Directory Network (IDN), has been active at the international, regional and national level for many years through its involvement with the Committee on Earth Observation Satellites (CEOS), federal agencies (such as NASA, NOAA, and USGS), international agencies (such as IOC/IODE, UN, and JAXA) and partnerships (such as ESIP, IOOS/DMAC, GOSIC, GLOBEC, OBIS, and GoMODP), sharing experience, knowledge related to metadata and/or data management and interoperability.
A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs
NASA Astrophysics Data System (ADS)
Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.
2013-12-01
The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolker, Eugene; Ozdemir, Vural; Martens , Lennart
Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies omics studies are becoming increasingly prevalent yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research,. These three essential steps require consistent generation, capture, and distribution of the metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologiesmore » and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. This omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement.« less
Özdemir, Vural; Martens, Lennart; Hancock, William; Anderson, Gordon; Anderson, Nathaniel; Aynacioglu, Sukru; Baranova, Ancha; Campagna, Shawn R.; Chen, Rui; Choiniere, John; Dearth, Stephen P.; Feng, Wu-Chun; Ferguson, Lynnette; Fox, Geoffrey; Frishman, Dmitrij; Grossman, Robert; Heath, Allison; Higdon, Roger; Hutz, Mara H.; Janko, Imre; Jiang, Lihua; Joshi, Sanjay; Kel, Alexander; Kemnitz, Joseph W.; Kohane, Isaac S.; Kolker, Natali; Lancet, Doron; Lee, Elaine; Li, Weizhong; Lisitsa, Andrey; Llerena, Adrian; MacNealy-Koch, Courtney; Marshall, Jean-Claude; Masuzzo, Paola; May, Amanda; Mias, George; Monroe, Matthew; Montague, Elizabeth; Mooney, Sean; Nesvizhskii, Alexey; Noronha, Santosh; Omenn, Gilbert; Rajasimha, Harsha; Ramamoorthy, Preveen; Sheehan, Jerry; Smarr, Larry; Smith, Charles V.; Smith, Todd; Snyder, Michael; Rapole, Srikanth; Srivastava, Sanjeeva; Stanberry, Larissa; Stewart, Elizabeth; Toppo, Stefano; Uetz, Peter; Verheggen, Kenneth; Voy, Brynn H.; Warnich, Louise; Wilhelm, Steven W.; Yandl, Gregory
2014-01-01
Abstract Biological processes are fundamentally driven by complex interactions between biomolecules. Integrated high-throughput omics studies enable multifaceted views of cells, organisms, or their communities. With the advent of new post-genomics technologies, omics studies are becoming increasingly prevalent; yet the full impact of these studies can only be realized through data harmonization, sharing, meta-analysis, and integrated research. These essential steps require consistent generation, capture, and distribution of metadata. To ensure transparency, facilitate data harmonization, and maximize reproducibility and usability of life sciences studies, we propose a simple common omics metadata checklist. The proposed checklist is built on the rich ontologies and standards already in use by the life sciences community. The checklist will serve as a common denominator to guide experimental design, capture important parameters, and be used as a standard format for stand-alone data publications. The omics metadata checklist and data publications will create efficient linkages between omics data and knowledge-based life sciences innovation and, importantly, allow for appropriate attribution to data generators and infrastructure science builders in the post-genomics era. We ask that the life sciences community test the proposed omics metadata checklist and data publications and provide feedback for their use and improvement. PMID:24456465
Combined use of semantics and metadata to manage Research Data Life Cycle in Environmental Sciences
NASA Astrophysics Data System (ADS)
Aguilar Gómez, Fernando; de Lucas, Jesús Marco; Pertinez, Esther; Palacio, Aida
2017-04-01
The use of metadata to contextualize datasets is quite extended in Earth System Sciences. There are some initiatives and available tools to help data managers to choose the best metadata standard that fit their use cases, like the DCC Metadata Directory (http://www.dcc.ac.uk/resources/metadata-standards). In our use case, we have been gathering physical, chemical and biological data from a water reservoir since 2010. A well metadata definition is crucial not only to contextualize our own data but also to integrate datasets from other sources like satellites or meteorological agencies. That is why we have chosen EML (Ecological Metadata Language), which integrates many different elements to define a dataset, including the project context, instrumentation and parameters definition, and the software used to process, provide quality controls and include the publication details. Those metadata elements can contribute to help both human and machines to understand and process the dataset. However, the use of metadata is not enough to fully support the data life cycle, from the Data Management Plan definition to the Publication and Re-use. To do so, we need to define not only metadata and attributes but also the relationships between them, so semantics are needed. Ontologies, being a knowledge representation, can contribute to define the elements of a research data life cycle, including DMP, datasets, software, etc. They also can define how the different elements are related between them and how they interact. The first advantage of developing an ontology of a knowledge domain is that they provide a common vocabulary hierarchy (i.e. a conceptual schema) that can be used and standardized by all the agents interested in the domain (either humans or machines). This way of using ontologies is one of the basis of the Semantic Web, where ontologies are set to play a key role in establishing a common terminology between agents. To develop an ontology we are using a graphical tool Protégé, which is a graphical ontology-development tool that supports a rich knowledge model and it is open-source and freely available. To process and manage the ontology, we are using Semantic MediaWiki, which is able to process queries. Semantic MediaWiki is an extension of MediaWiki where we can do semantic search and export data in RDF. Our final goal is integrating our data repository portal and semantic processing engine in order to have a complete system to manage the data life cycle stages and their relationships, including machine-actionable DMP solution, datasets and software management, computing resources for processing and analysis and publication features (DOI mint). This way we will be able to reproduce the full data life cycle chain warranting the FAIR+R principles.
Omics Metadata Management Software (OMMS).
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)
Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo
2015-01-01
Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
Leveraging Metadata to Create Interactive Images... Today!
NASA Astrophysics Data System (ADS)
Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.
2011-01-01
The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org
NASA Astrophysics Data System (ADS)
Boldrini, Enrico; Schaap, Dick M. A.; Nativi, Stefano
2013-04-01
SeaDataNet implements a distributed pan-European infrastructure for Ocean and Marine Data Management whose nodes are maintained by 40 national oceanographic and marine data centers from 35 countries riparian to all European seas. A unique portal makes possible distributed discovery, visualization and access of the available sea data across all the member nodes. Geographic metadata play an important role in such an infrastructure, enabling an efficient documentation and discovery of the resources of interest. In particular: - Common Data Index (CDI) metadata describe the sea datasets, including identification information (e.g. product title, interested area), evaluation information (e.g. data resolution, constraints) and distribution information (e.g. download endpoint, download protocol); - Cruise Summary Reports (CSR) metadata describe cruises and field experiments at sea, including identification information (e.g. cruise title, name of the ship), acquisition information (e.g. utilized instruments, number of samples taken) In the context of the second phase of SeaDataNet (SeaDataNet 2 EU FP7 project, grant agreement 283607, started on October 1st, 2011 for a duration of 4 years) a major target is the setting, adoption and promotion of common international standards, to the benefit of outreach and interoperability with the international initiatives and communities (e.g. OGC, INSPIRE, GEOSS, …). A standardization effort conducted by CNR with the support of MARIS, IFREMER, STFC, BODC and ENEA has led to the creation of a ISO 19115 metadata profile of CDI and its XML encoding based on ISO 19139. The CDI profile is now in its stable version and it's being implemented and adopted by the SeaDataNet community tools and software. The effort has then continued to produce an ISO based metadata model and its XML encoding also for CSR. The metadata elements included in the CSR profile belong to different models: - ISO 19115: E.g. cruise identification information, including title and area of interest; metadata responsible party information - ISO 19115-2: E.g. acquisition information, including date of sampling, instruments used - SeaDataNet: E.g. SeaDataNet community specific, including EDMO and EDMERP code lists Two main guidelines have been followed in the metadata model drafting: - All the obligations and constraints required by both the ISO standards and INSPIRE directive had to be satisfied. These include the presence of specific elements with given cardinality (e.g. mandatory metadata date stamp, mandatory lineage information) - All the content information of legacy CSR format had to be supported by the new metadata model. An XML encoding of the CSR profile has been defined as well. Based on the ISO 19139 XML schema and constraints, it adds the new elements specific of the SeaDataNet community. The associated Schematron rules are used to enforce constraints not enforceable just with the Schema and to validate elements content against the SeaDataNet code lists vocabularies.
NASA Astrophysics Data System (ADS)
Nass, Andrea; van Gasselt, Stephan; Jaumann, Ralf
2010-05-01
The Helmholtz Alliance and the European Planetary Network are research communities with different main topics. One of the main research topics which are shared by these communities is the question about the geomorphological evolutions of planetary surfaces as well as the geological context of life. This research contains questions like "Is there volcanic activity on a planet?" or "Where are possible landing sites?". In order to help answering such questions, analyses of surface features and morphometric measurements need to be performed. This ultimately leads to the generation of thematic maps (e.g. geological and geomorphologic maps) as a basis for the further studies. By using modern GIS techniques the comparative work and generalisation during mapping processes results in new information. These insights are crucial for subsequent investigations. Therefore, the aim is to make these results available to the research community as a secondary data basis. In order to obtain a common and interoperable data collection results of different mapping projects have to follow a standardised data-infrastructure, metadata definition and map layout. Therefore, we are currently focussing on the generation of a database model arranging all data and processes in a uniform mapping schema. With the help of such a schema, the mapper will be able to utilise a predefined (but customisable) GIS environment with individual tool items as well as a standardised symbolisation and a metadata environment. This environment is based on a data model which is currently on a conceptual level and provides the layout of the data infrastructure including relations and topologies. One of the first tasks towards this data model is the definition of a consistent basis of symbolisation standards developed for planetary mapping. The mapper/geologist will be able to access the pre-built signatures and utilise these in scale dependence within the mapping project. The symbolisation will be related to the data model in the next step. As second task, we designed a concept for description of the digital mapping result. Therefore, we are creating a metadata template based on existing standards for individual needs in planetary sciences. This template is subdivided in (meta) data about the general map content (e.g. on which data the mapping result based on) and in metadata for each individual mapping element/layer comprising information like minimum mapping scale, interpretation hints, etc. The assignment of such a metadata description in combination with the usage of a predefined mapping schema facilitates the efficient and traceable storage of data information on a network server and enables a subsequent representation, e.g. as a mapserver data structure. Acknowledgement: This work is partly supported by DLR and the Helmholtz Alliance "Planetary Evolution and Life".
NASA Astrophysics Data System (ADS)
Leibovici, D. G.; Pourabdollah, A.; Jackson, M.
2011-12-01
Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
NASA Astrophysics Data System (ADS)
San Gil, Inigo; White, Marshall; Melendez, Eda; Vanderbilt, Kristin
The thirty-year-old United States Long Term Ecological Research Network has developed extensive metadata to document their scientific data. Standard and interoperable metadata is a core component of the data-driven analytical solutions developed by this research network Content management systems offer an affordable solution for rapid deployment of metadata centered information management systems. We developed a customized integrative metadata management system based on the Drupal content management system technology. Building on knowledge and experience with the Sevilleta and Luquillo Long Term Ecological Research sites, we successfully deployed the first two medium-scale customized prototypes. In this paper, we describe the vision behind our Drupal based information management instances, and list the features offered through these Drupal based systems. We also outline the plans to expand the information services offered through these metadata centered management systems. We will conclude with the growing list of participants deploying similar instances.
NASA Astrophysics Data System (ADS)
Baumann, Peter
2013-04-01
There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.
The Global Genome Biodiversity Network (GGBN) Data Standard specification.
Droege, G; Barker, K; Seberg, O; Coddington, J; Benson, E; Berendsohn, W G; Bunk, B; Butler, C; Cawsey, E M; Deck, J; Döring, M; Flemons, P; Gemeinholzer, B; Güntsch, A; Hollowell, T; Kelbert, P; Kostadinov, I; Kottmann, R; Lawlor, R T; Lyal, C; Mackenzie-Dodds, J; Meyer, C; Mulcahy, D; Nussbeck, S Y; O'Tuama, É; Orrell, T; Petersen, G; Robertson, T; Söhngen, C; Whitacre, J; Wieczorek, J; Yilmaz, P; Zetzsche, H; Zhang, Y; Zhou, X
2016-01-01
Genomic samples of non-model organisms are becoming increasingly important in a broad range of studies from developmental biology, biodiversity analyses, to conservation. Genomic sample definition, description, quality, voucher information and metadata all need to be digitized and disseminated across scientific communities. This information needs to be concise and consistent in today's ever-increasing bioinformatic era, for complementary data aggregators to easily map databases to one another. In order to facilitate exchange of information on genomic samples and their derived data, the Global Genome Biodiversity Network (GGBN) Data Standard is intended to provide a platform based on a documented agreement to promote the efficient sharing and usage of genomic sample material and associated specimen information in a consistent way. The new data standard presented here build upon existing standards commonly used within the community extending them with the capability to exchange data on tissue, environmental and DNA sample as well as sequences. The GGBN Data Standard will reveal and democratize the hidden contents of biodiversity biobanks, for the convenience of everyone in the wider biobanking community. Technical tools exist for data providers to easily map their databases to the standard.Database URL: http://terms.tdwg.org/wiki/GGBN_Data_Standard. © The Author(s) 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Leif, Robert C.; Leif, Stephanie H.
2016-04-01
Introduction: The International Society for Advancement of Cytometry (ISAC) has created a standard for the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt 1.0). CytometryML will serve as a common metadata standard for flow and image cytometry (digital microscopy). Methods: The MIFlowCyt data-types were created, as is the rest of CytometryML, in the XML Schema Definition Language (XSD1.1). The datatypes are primarily based on the Flow Cytometry and the Digital Imaging and Communication (DICOM) standards. A small section of the code was formatted with standard HTML formatting elements (p, h1, h2, etc.). Results:1) The part of MIFlowCyt that describes the Experimental Overview including the specimen and substantial parts of several other major elements has been implemented as CytometryML XML schemas (www.cytometryml.org). 2) The feasibility of using MIFlowCyt to provide the combination of an overview, table of contents, and/or an index of a scientific paper or a report has been demonstrated. Previously, a sample electronic publication, EPUB, was created that could contain both MIFlowCyt metadata as well as the binary data. Conclusions: The use of CytometryML technology together with XHTML5 and CSS permits the metadata to be directly formatted and together with the binary data to be stored in an EPUB container. This will facilitate: formatting, data- mining, presentation, data verification, and inclusion in structured research, clinical, and regulatory documents, as well as demonstrate a publication's adherence to the MIFlowCyt standard, promote interoperability and should also result in the textual and numeric data being published using web technology without any change in composition.
Determining the Completeness of the Nimbus Meteorological Data Archive
NASA Technical Reports Server (NTRS)
Johnson, James; Moses, John; Kempler, Steven; Zamkoff, Emily; Al-Jazrawi, Atheer; Gerasimov, Irina; Trivedi, Bhagirath
2011-01-01
NASA launched the Nimbus series of meteorological satellites in the 1960s and 70s. These satellites carried instruments for making observations of the Earth in the visible, infrared, ultraviolet, and microwave wavelengths. The original data archive consisted of a combination of digital data written to 7-track computer tapes and on various film media. Many of these data sets are now being migrated from the old media to the GES DISC modern online archive. The process involves recovering the digital data files from tape as well as scanning images of the data from film strips. Some of the challenges of archiving the Nimbus data include the lack of any metadata from these old data sets. Metadata standards and self-describing data files did not exist at that time, and files were written on now obsolete hardware systems and outdated file formats. This requires creating metadata by reading the contents of the old data files. Some digital data files were corrupted over time, or were possibly improperly copied at the time of creation. Thus there are data gaps in the collections. The film strips were stored in boxes and are now being scanned as JPEG-2000 images. The only information describing these images is what was written on them when they were originally created, and sometimes this information is incomplete or missing. We have the ability to cross-reference the scanned images against the digital data files to determine which of these best represents the data set from the various missions, or to see how complete the data sets are. In this presentation we compared data files and scanned images from the Nimbus-2 High-Resolution Infrared Radiometer (HRIR) for September 1966 to determine whether the data and images are properly archived with correct metadata.
Now That We've Found the "Hidden Web," What Can We Do with It?
ERIC Educational Resources Information Center
Cole, Timothy W.; Kaczmarek, Joanne; Marty, Paul F.; Prom, Christopher J.; Sandore, Beth; Shreeves, Sarah
The Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) is designed to facilitate discovery of the "hidden web" of scholarly information, such as that contained in databases, finding aids, and XML documents. OAI-PMH supports standardized exchange of metadata describing items in disparate collections, of such as those…
Automatic Conversion of Metadata from the Study of Health in Pomerania to ODM.
Hegselmann, Stefan; Gessner, Sophia; Neuhaus, Philipp; Henke, Jörg; Schmidt, Carsten Oliver; Dugas, Martin
2017-01-01
Electronic collection and high quality analysis of medical data is expected to have a big potential to improve patient care and medical research. However, the integration of data from different stake holders is posing a crucial problem. The exchange and reuse of medical data models as well as annotations with unique semantic identifiers were proposed as a solution. Convert metadata from the Study of Health in Pomerania to the standardized CDISC ODM format. The structure of the two data formats is analyzed and a mapping is suggested and implemented. The metadata from the Study of Health in Pomerania was successfully converted to ODM. All relevant information was included in the resulting forms. Three sample forms were evaluated in-depth, which demonstrates the feasibility of this conversion. Hundreds of data entry forms with more than 15.000 items can be converted into a standardized format with some limitations, e.g. regarding logical constraints. This enables the integration of the Study of Health in Pomerania metadata into various systems, facilitating the implementation and reuse in different study sites.
Practical management of heterogeneous neuroimaging metadata by global neuroimaging data repositories
Neu, Scott C.; Crawford, Karen L.; Toga, Arthur W.
2012-01-01
Rapidly evolving neuroimaging techniques are producing unprecedented quantities of digital data at the same time that many research studies are evolving into global, multi-disciplinary collaborations between geographically distributed scientists. While networked computers have made it almost trivial to transmit data across long distances, collecting and analyzing this data requires extensive metadata if the data is to be maximally shared. Though it is typically straightforward to encode text and numerical values into files and send content between different locations, it is often difficult to attach context and implicit assumptions to the content. As the number of and geographic separation between data contributors grows to national and global scales, the heterogeneity of the collected metadata increases and conformance to a single standardization becomes implausible. Neuroimaging data repositories must then not only accumulate data but must also consolidate disparate metadata into an integrated view. In this article, using specific examples from our experiences, we demonstrate how standardization alone cannot achieve full integration of neuroimaging data from multiple heterogeneous sources and why a fundamental change in the architecture of neuroimaging data repositories is needed instead. PMID:22470336
Compatibility Between Metadata Standards: Import Pipeline of CDISC ODM to the Samply.MDR.
Kock-Schoppenhauer, Ann-Kristin; Ulrich, Hannes; Wagen-Zink, Stefanie; Duhm-Harbeck, Petra; Ingenerf, Josef; Neuhaus, Philipp; Dugas, Martin; Bruland, Philipp
2018-01-01
The establishment of a digital healthcare system is a national and community task. The Federal Ministry of Education and Research in Germany is providing funding for consortia consisting of university hospitals among others participating in the "Medical Informatics Initiative". Exchange of medical data between research institutions necessitates a place where meta information for this data is made accessible. Within these consortia different metadata registry solutions were chosen. To promote interoperability between these solutions, we have examined whether the portal of Medical Data Models is eligible for managing and communicating metadata and relevant information across different data integration centres of the Medical Informatics Initiative and beyond. Apart from the MDM-portal, some ISO 11179-based systems such as Samply.MDR as well as openEHR-based solutions are going to be applyed. In this paper, we have focused on the creation of a mapping model between the CDISC ODM standard and the Samply.MDR import format. In summary, it can be stated that the mapping model is feasible and promote the exchangeability between different metadata registry approaches.
Neu, Scott C; Crawford, Karen L; Toga, Arthur W
2012-01-01
Rapidly evolving neuroimaging techniques are producing unprecedented quantities of digital data at the same time that many research studies are evolving into global, multi-disciplinary collaborations between geographically distributed scientists. While networked computers have made it almost trivial to transmit data across long distances, collecting and analyzing this data requires extensive metadata if the data is to be maximally shared. Though it is typically straightforward to encode text and numerical values into files and send content between different locations, it is often difficult to attach context and implicit assumptions to the content. As the number of and geographic separation between data contributors grows to national and global scales, the heterogeneity of the collected metadata increases and conformance to a single standardization becomes implausible. Neuroimaging data repositories must then not only accumulate data but must also consolidate disparate metadata into an integrated view. In this article, using specific examples from our experiences, we demonstrate how standardization alone cannot achieve full integration of neuroimaging data from multiple heterogeneous sources and why a fundamental change in the architecture of neuroimaging data repositories is needed instead.
Content standards for medical image metadata
NASA Astrophysics Data System (ADS)
d'Ornellas, Marcos C.; da Rocha, Rafael P.
2003-12-01
Medical images are at the heart of the healthcare diagnostic procedures. They have provided not only a noninvasive mean to view anatomical cross-sections of internal organs but also a mean for physicians to evaluate the patient"s diagnosis and monitor the effects of the treatment. For a Medical Center, the emphasis may shift from the generation of image to post processing and data management since the medical staff may generate even more processed images and other data from the original image after various analyses and post processing. A medical image data repository for health care information system is becoming a critical need. This data repository would contain comprehensive patient records, including information such as clinical data and related diagnostic images, and post-processed images. Due to the large volume and complexity of the data as well as the diversified user access requirements, the implementation of the medical image archive system will be a complex and challenging task. This paper discusses content standards for medical image metadata. In addition it also focuses on the image metadata content evaluation and metadata quality management.
2011-01-01
Background The Biodiversity Heritage Library (BHL) is a large digital archive of legacy biological literature, comprising over 31 million pages scanned from books, monographs, and journals. During the digitisation process basic metadata about the scanned items is recorded, but not article-level metadata. Given that the article is the standard unit of citation, this makes it difficult to locate cited literature in BHL. Adding the ability to easily find articles in BHL would greatly enhance the value of the archive. Description A service was developed to locate articles in BHL based on matching article metadata to BHL metadata using approximate string matching, regular expressions, and string alignment. This article locating service is exposed as a standard OpenURL resolver on the BioStor web site http://biostor.org/openurl/. This resolver can be used on the web, or called by bibliographic tools that support OpenURL. Conclusions BioStor provides tools for extracting, annotating, and visualising articles from the Biodiversity Heritage Library. BioStor is available from http://biostor.org/. PMID:21605356
Master Metadata Repository and Metadata-Management System
NASA Technical Reports Server (NTRS)
Armstrong, Edward; Reed, Nate; Zhang, Wen
2007-01-01
A master metadata repository (MMR) software system manages the storage and searching of metadata pertaining to data from national and international satellite sources of the Global Ocean Data Assimilation Experiment (GODAE) High Resolution Sea Surface Temperature Pilot Project [GHRSSTPP]. These sources produce a total of hundreds of data files daily, each file classified as one of more than ten data products representing global sea-surface temperatures. The MMR is a relational database wherein the metadata are divided into granulelevel records [denoted file records (FRs)] for individual satellite files and collection-level records [denoted data set descriptions (DSDs)] that describe metadata common to all the files from a specific data product. FRs and DSDs adhere to the NASA Directory Interchange Format (DIF). The FRs and DSDs are contained in separate subdatabases linked by a common field. The MMR is configured in MySQL database software with custom Practical Extraction and Reporting Language (PERL) programs to validate and ingest the metadata records. The database contents are converted into the Federal Geographic Data Committee (FGDC) standard format by use of the Extensible Markup Language (XML). A Web interface enables users to search for availability of data from all sources.
Enabling conformity to international standards within SeaDataNet
NASA Astrophysics Data System (ADS)
Schaap, Dick M. A.; Boldrini, Enrico; de Korte, Arjen; Santoro, Mattia; Manzella, Giuseppe; Nativi, Stefano
2010-05-01
SeaDataNet objective is to construct a standardized system for managing the large and diverse data sets collected by the oceanographic fleets and the new automatic observation systems. The aim is to network and enhance the currently existing infrastructures, which are the national oceanographic data centres and satellite data centres of 36 countries, active in data collection. The networking of these professional data centres, in a unique virtual data management system will provide integrated data sets of standardized quality on-line. The Common Data Index (CDI) is the middleware service adopted by SeaDataNet for discovery and access of the available data. In order to develop an interoperable and effective system, the use of international de facto and de jure standards is required. In particular the new goal object of this presentation is to introduce and discuss the solutions for making SeaDataNet compliant with the European Union (EU) INSPIRE directive and in particular with its Implementing Rules (IR). The European INSPIRE directive aims to rule the creation of an European Spatial Data Infrastructure (ESDI). This will enable the sharing of environmental spatial information among public sector organisations and better facilitate public access to spatial information across Europe. To ensure that the spatial data infrastructures of the European Member States are compatible and usable in a community and transboundary context, the directive requires that common IRs are adopted in a number of specific areas (Metadata, Data Specifications, Network Services, Data and Service Sharing and Monitoring and Reporting). Often the use of already approved digital geographic information standards is mandated, drawing from international organizations like the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO), the latter by means of its Technical Committee 211 (ISO/TC 211). In the context of geographic data discovery a set of mandatory metadata information is identified by INSPIRE metadata regulations and recommended implementations appear in IRs, in particular the use of ISO 19139 Application Profile (ISO AP) of OGC Catalogue Service for the Web 2.0.2 (CSW), as well as the use of ISO19139 XML schemas (along with additional constraints) to encode and distribute the required INSPIRE metadata. SeaDataNet started its work in 2006, basing its metadata schema upon the ISO 19115 DTD, the available schema at that time. Overtime this was replaced with the present CDI v.1 XML schema, based on ISO 19115 abstract model with community specific features and constraints. In order to assure the INSPIRE conformity a GI-cat based solution was developed. GI-cat is a broker service able to mediate from different metadata sources and publish them through a consistent and unified interface. In this case GI-cat is used as a front end to the SeaDataNet portal publishing the available data, based on CDI v.1, through a CSW AP ISO interface. The first step consisted in the precise definition of a community profile of ISO19115, containing both INSPIRE and CDI driven constraints and extensions. This abstract model is ready to be implemented both in CDI v.1 and in ISO 19139; to this aim, guidelines were drafted. Then a mapping from the CDI v.1 to the ISO 19139 implementation was ready to be produced. The work resulted in the creation of a new CDI accessor within GI-cat. These type of components play the role of data model mediators within the framework. While a replacement of the CDI v.1 format with the ISO 19139 solution is planned for SeaDataNet in the future, this front-end solution make data discovery readily effective by clients within the INSPIRE community.
Managing biomedical image metadata for search and retrieval of similar images.
Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris
2011-08-01
Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
The New Online Metadata Editor for Generating Structured Metadata
NASA Astrophysics Data System (ADS)
Devarakonda, R.; Shrestha, B.; Palanisamy, G.; Hook, L.; Killeffer, T.; Boden, T.; Cook, R. B.; Zolly, L.; Hutchison, V.; Frame, M. T.; Cialella, A. T.; Lazer, K.
2014-12-01
Nobody is better suited to "describe" data than the scientist who created it. This "description" about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [1] [2]. Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. How is OME helping Big Data Centers like ORNL DAAC? The ORNL DAAC is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers managed by the ESDIS Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability.References:[1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [2] Wilson, Bruce E., et al. "Mercury Toolset for Spatiotemporal Metadata." NASA Technical Reports Server (NTRS) (2010).
Challenges and opportunities of open data in ecology.
Reichman, O J; Jones, Matthew B; Schildhauer, Mark P
2011-02-11
Ecology is a synthetic discipline benefiting from open access to data from the earth, life, and social sciences. Technological challenges exist, however, due to the dispersed and heterogeneous nature of these data. Standardization of methods and development of robust metadata can increase data access but are not sufficient. Reproducibility of analyses is also important, and executable workflows are addressing this issue by capturing data provenance. Sociological challenges, including inadequate rewards for sharing data, must also be resolved. The establishment of well-curated, federated data repositories will provide a means to preserve data while promoting attribution and acknowledgement of its use.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services
NASA Astrophysics Data System (ADS)
Elger, K.; Ulbricht, D.; Bertelmann, R.
2016-12-01
GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo-Inf. 2016, 5, 25. http://doi.org/10.3390/ijgi5030025[2] https://github.com/datacite[3] https://github.com/ulbricht/search/tree/doidb , https://github.com/ulbricht/mds/tree/doidb , https://github.com/ulbricht/oaip/tree/doidb[4] http://doidb.wdc-terra.org
Kuchinke, W; Wiegelmann, S; Verplancke, P; Ohmann, C
2006-01-01
Our objectives were to analyze the possibility of an exchange of an entire clinical study between two different and independent study software solutions. The question addressed was whether a software-independent transfer of study metadata can be performed without programming efforts and with software routinely used for clinical research. Study metadata was transferred with ODM standard (CDISC). Study software systems employed were MACRO (InferMed) and XTrial (XClinical). For the Proof of Concept, a test study was created with MACRO and exported as ODM. For modification and validation of the ODM export file XML-Spy (Altova) and ODM-Checker (XML4Pharma) were used. Through exchange of a complete clinical study between two different study software solutions, a Proof of Concept of the technical feasibility of a system-independent metadata exchange was conducted successfully. The interchange of study metadata between two different systems at different centers was performed with minimal expenditure. A small number of mistakes had to be corrected in order to generate a syntactically correct ODM file and a "vendor extension" had to be inserted. After these modifications, XTrial exhibited the study, including all data fields, correctly. However, the optical appearance of both CRFs (case report forms) was different. ODM can be used as an exchange format for clinical studies between different study software. Thus, new forms of cooperation through exchange of metadata seem possible, for example the joint creation of electronic study protocols or CRFs at different research centers. Although the ODM standard represents a clinical study completely, it contains no information about the representation of data fields in CRFs.
ERIC Educational Resources Information Center
Dietze, Stefan; Gugliotta, Alessio; Domingue, John
2009-01-01
Current E-Learning technologies primarily follow a data and metadata-centric paradigm by providing the learner with composite content containing the learning resources and the learning process description, usually based on specific metadata standards such as ADL SCORM or IMS Learning Design. Due to the design-time binding of learning resources,…
The long-term ecological research community metada standardisation project: a progress report
Inigo San Gil; Karen Baker; John Campbell; Ellen G. Denny; Kristin Vanderbilt; Brian Riordan; Rebecca Koskela; Jason Downing; Sabine Grabner; Eda Melendez; Jonathan M. Walsh; Masib Kortz; James Conners; Lynn Yarmey; Nicole Kaplan; Emery R. Boose; Linda Powell; Corinna Gries; Robin Schroeder; Todd Ackerman; Ken Ramsey; Barbara Benson; Jonathan Chipman; James Laundre; Hap Garritt; Don Henshaw; Barrie Collins; Christopher Gardner; Sven Bohm; Margaret O' Brien; Jincheng Gao; Wade Sheldon; Stephanie Lyon; Dan Bahauddin; Mark Servilla; Duane Costa; James Brunt
2009-01-01
We describe the process by which the Long-Term Ecological Research (LTER) Network standardized their metadata through the adoption of the Ecological Metadata Language (EML). We describe the strategies developed to improve motivation and to complement the information technology resources available at the LTER sites. EML implementation is presented as a mapping process...
NASA Astrophysics Data System (ADS)
le Roux, J.; Baker, A.; Caltagirone, S.; Bugbee, K.
2017-12-01
The Common Metadata Repository (CMR) is a high-performance, high-quality repository for Earth science metadata records, and serves as the primary way to search NASA's growing 17.5 petabytes of Earth science data holdings. Released in 2015, CMR has the capability to support several different metadata standards already being utilized by NASA's combined network of Earth science data providers, or Distributed Active Archive Centers (DAACs). The Analysis and Review of CMR (ARC) Team located at Marshall Space Flight Center is working to improve the quality of records already in CMR with the goal of making records optimal for search and discovery. This effort entails a combination of automated and manual review, where each NASA record in CMR is checked for completeness, accuracy, and consistency. This effort is highly collaborative in nature, requiring communication and transparency of findings amongst NASA personnel, DAACs, the CMR team and other metadata curation teams. Through the evolution of this project it has become apparent that there is a need to document and report findings, as well as track metadata improvements in a more efficient manner. The ARC team has collaborated with Element 84 in order to develop a metadata curation tool to meet these needs. In this presentation, we will provide an overview of this metadata curation tool and its current capabilities. Challenges and future plans for the tool will also be discussed.
XML — an opportunity for
NASA Astrophysics Data System (ADS)
Houlding, Simon W.
2001-08-01
Extensible markup language (XML) is a recently introduced meta-language standard on the Web. It provides the rules for development of metadata (markup) standards for information transfer in specific fields. XML allows development of markup languages that describe what information is rather than how it should be presented. This allows computer applications to process the information in intelligent ways. In contrast hypertext markup language (HTML), which fuelled the initial growth of the Web, is a metadata standard concerned exclusively with presentation of information. Besides its potential for revolutionizing Web activities, XML provides an opportunity for development of meaningful data standards in specific application fields. The rapid endorsement of XML by science, industry and e-commerce has already spawned new metadata standards in such fields as mathematics, chemistry, astronomy, multi-media and Web micro-payments. Development of XML-based data standards in the geosciences would significantly reduce the effort currently wasted on manipulating and reformatting data between different computer platforms and applications and would ensure compatibility with the new generation of Web browsers. This paper explores the evolution, benefits and status of XML and related standards in the more general context of Web activities and uses this as a platform for discussion of its potential for development of data standards in the geosciences. Some of the advantages of XML are illustrated by a simple, browser-compatible demonstration of XML functionality applied to a borehole log dataset. The XML dataset and the associated stylesheet and schema declarations are available for FTP download.
An Observation Knowledgebase for Hinode Data
NASA Astrophysics Data System (ADS)
Hurlburt, Neal E.; Freeland, S.; Green, S.; Schiff, D.; Seguin, R.; Slater, G.; Cirtain, J.
2007-05-01
We have developed a standards-based system for the Solar Optical and X Ray Telescopes on the Hinode orbiting solar observatory which can serve as part of a developing Heliophysics informatics system. Our goal is to make the scientific data acquired by Hinode more accessible and useful to scientists by allowing them to do reasoning and flexible searches on observation metadata and to ask higher-level questions of the system than previously allowed. The Hinode Observation Knowledgebase relates the intentions and goals of the observation planners (as-planned metadata) with actual observational data (as-run metadata), along with connections to related models, data products and identified features (follow-up metadata) through a citation system. Summaries of the data (both as image thumbnails and short "film strips") serve to guide researchers to the observations appropriate for their research, and these are linked directly to the data catalog for easy extraction and delivery. The semantic information of the observation (Field of view, wavelength, type of observable, average cadence etc.) is captured through simple user interfaces and encoded using the VOEvent XML standard (with the addition of some solar-related extensions). These interfaces merge metadata acquired automatically during both mission planning and an data analysis (see Seguin et. al. 2007 at this meeting) phases with that obtained directly from the planner/analyst and send them to be incorporated into the knowledgebase. The resulting information is automatically rendered into standard categories based on planned and recent observations, as well as by popularity and recommendations by the science team. They are also directly searchable through both and web-based searches and direct calls to the API. Observations details can also be rendered as RSS, iTunes and Google Earth interfaces. The resulting system provides a useful tool to researchers and can act as a demonstration for larger, more complex systems.
Principles of metadata organization at the ENCODE data coordination center
Hong, Eurie L.; Sloan, Cricket A.; Chan, Esther T.; Davidson, Jean M.; Malladi, Venkat S.; Strattan, J. Seth; Hitz, Benjamin C.; Gabdank, Idan; Narayanan, Aditi K.; Ho, Marcus; Lee, Brian T.; Rowe, Laurence D.; Dreszer, Timothy R.; Roe, Greg R.; Podduturi, Nikhil R.; Tanaka, Forrest; Hilton, Jason A.; Cherry, J. Michael
2016-01-01
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org PMID:26980513
Plugging Into GEOSS - A Data Center Takes the Leap
NASA Astrophysics Data System (ADS)
Khalsa, S. S.; Weaver, R. L.; Duerr, R. E.; Shaw, A.
2008-12-01
The data sets managed and distributed by the National Snow and Ice Data Center in Boulder, Colorado are accessible through a variety of interfaces: custom web services, WIST, which is the NASA EOS Data System interface, and by simple FTP. The Global Earth Observation System of Systems, GEOSS, offers the potential to make our data visible and accessible in the context of a much larger and more widely available system. But what does a data center have to do to tie into this larger system? What are the optimal data formats and protocols that should be maintained? What metadata standards and services should we sustain in order to maximize the visibility of our data? How will our holdings in existing catalogs be harvested by GEOSS? We address these questions through a pilot study that we report on in this paper. On June 2, 2008 the Group on Earth Observation, GEO, announced that the GEOSS Common Infrastructure (GCI) was "open for business," and that this Initial Operating Capability (IOC) was beginning a 1-year testing and evaluation period. The purpose of the IOC is two-fold: first, to encourage Earth observation providers to populate GEOSS by registering their data sets, services, and other components; and 2) to allow the global community to use, evaluate and thereby improve the GCI. NSIDC is contributing to both objectives. The GEOSS 10-Year Implementation Plan specifies, at a very high level, recommended standards for connectivity for services, data and metadata. GEO has also published Tactical and Strategic Guidance Documents to help data providers like NSIDC decide how it should proceed to become an active participant in the GEOSS. GEOSS and NSIDC are both adopting many of the OGC standards as their respective systems evolve. But how well do the OGC implementations of each of these entities mesh? What are the gaps, what are the currently less well developed yet critical path standards that require work? We describe our experiences in registering several data sets having differing levels and types of associated services. We review the GEOSS efforts and study their published requirements and standards and see how well they mesh to the NSIDC systems, metadata and data distribution systems, and then describe our experiences in making our data and services available via the GCI.
An integrated view of data quality in Earth observation
Yang, X.; Blower, J. D.; Bastin, L.; Lush, V.; Zabala, A.; Masó, J.; Cornford, D.; Díaz, P.; Lumsden, J.
2013-01-01
Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research. PMID:23230156
An integrated view of data quality in Earth observation.
Yang, X; Blower, J D; Bastin, L; Lush, V; Zabala, A; Masó, J; Cornford, D; Díaz, P; Lumsden, J
2013-01-28
Data quality is a difficult notion to define precisely, and different communities have different views and understandings of the subject. This causes confusion, a lack of harmonization of data across communities and omission of vital quality information. For some existing data infrastructures, data quality standards cannot address the problem adequately and cannot fulfil all user needs or cover all concepts of data quality. In this study, we discuss some philosophical issues on data quality. We identify actual user needs on data quality, review existing standards and specifications on data quality, and propose an integrated model for data quality in the field of Earth observation (EO). We also propose a practical mechanism for applying the integrated quality information model to a large number of datasets through metadata inheritance. While our data quality management approach is in the domain of EO, we believe that the ideas and methodologies for data quality management can be applied to wider domains and disciplines to facilitate quality-enabled scientific research.
Integrated workflows for spiking neuronal network simulations
Antolík, Ján; Davison, Andrew P.
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
Integrated workflows for spiking neuronal network simulations.
Antolík, Ján; Davison, Andrew P
2013-01-01
The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
Weather forecasting with open source software
NASA Astrophysics Data System (ADS)
Rautenhaus, Marc; Dörnbrack, Andreas
2013-04-01
To forecast the weather situation during aircraft-based atmospheric field campaigns, we employ a tool chain of existing and self-developed open source software tools and open standards. Of particular value are the Python programming language with its extension libraries NumPy, SciPy, PyQt4, Matplotlib and the basemap toolkit, the NetCDF standard with the Climate and Forecast (CF) Metadata conventions, and the Open Geospatial Consortium Web Map Service standard. These open source libraries and open standards helped to implement the "Mission Support System", a Web Map Service based tool to support weather forecasting and flight planning during field campaigns. The tool has been implemented in Python and has also been released as open source (Rautenhaus et al., Geosci. Model Dev., 5, 55-71, 2012). In this presentation we discuss the usage of free and open source software for weather forecasting in the context of research flight planning, and highlight how the field campaign work benefits from using open source tools and open standards.
ASDF: An Adaptable Seismic Data Format with Full Provenance
NASA Astrophysics Data System (ADS)
Smith, J. A.; Krischer, L.; Tromp, J.; Lefebvre, M. P.
2015-12-01
In order for seismologists to maximize their knowledge of how the Earth works, they must extract the maximum amount of useful information from all recorded seismic data available for their research. This requires assimilating large sets of waveform data, keeping track of vast amounts of metadata, using validated standards for quality control, and automating the workflow in a careful and efficient manner. In addition, there is a growing gap between CPU/GPU speeds and disk access speeds that leads to an I/O bottleneck in seismic workflows. This is made even worse by existing seismic data formats that were not designed for performance and are limited to a few fixed headers for storing metadata.The Adaptable Seismic Data Format (ASDF) is a new data format for seismology that solves the problems with existing seismic data formats and integrates full provenance into the definition. ASDF is a self-describing format that features parallel I/O using the parallel HDF5 library. This makes it a great choice for use on HPC clusters. The format integrates the standards QuakeML for seismic sources and StationXML for receivers. ASDF is suitable for storing earthquake data sets, where all waveforms for a single earthquake are stored in a one file, ambient noise cross-correlations, and adjoint sources. The format comes with a user-friendly Python reader and writer that gives seismologists access to a full set of Python tools for seismology. There is also a faster C/Fortran library for integrating ASDF into performance-focused numerical wave solvers, such as SPECFEM3D_GLOBE. Finally, a GUI tool designed for visually exploring the format exists that provides a flexible interface for both research and educational applications. ASDF is a new seismic data format that offers seismologists high-performance parallel processing, organized and validated contents, and full provenance tracking for automated seismological workflows.
Preservation of Digital Objects.
ERIC Educational Resources Information Center
Galloway, Patricia
2004-01-01
Presents a literature review that covers the following topics related to preservation of digital objects: practical examples; stakeholders; recordkeeping standards; genre-specific problems; trusted repository standards; preservation methods; preservation metadata standards; and future directions. (Contains 82 references.) (MES)
Hume, Samuel; Chow, Anthony; Evans, Julie; Malfait, Frederik; Chason, Julie; Wold, J. Darcy; Kubick, Wayne; Becnel, Lauren B.
2018-01-01
The Clinical Data Interchange Standards Consortium (CDISC) is a global non-profit standards development organization that creates consensus-based standards for clinical and translational research. Several of these standards are now required by regulators for electronic submissions of regulated clinical trials’ data and by government funding agencies. These standards are free and open, available for download on the CDISC Website as PDFs. While these documents are human readable, they are not amenable to ready use by electronic systems. CDISC launched the CDISC Shared Health And Research Electronic library (SHARE) to provide the standards metadata in machine-readable formats to facilitate the automated management and implementation of the standards. This paper describes how CDISC SHARE’S standards can facilitate collecting, aggregating and analyzing standardized data from early design to end analysis; and its role as a central resource providing information systems with metadata that drives process automation including study setup and data pipelining. PMID:29888049
Description of the U.S. Geological Survey Geo Data Portal data integration framework
Blodgett, David L.; Booth, Nathaniel L.; Kunicki, Thomas C.; Walker, Jordan I.; Lucido, Jessica M.
2012-01-01
The U.S. Geological Survey has developed an open-standard data integration framework for working efficiently and effectively with large collections of climate and other geoscience data. A web interface accesses catalog datasets to find data services. Data resources can then be rendered for mapping and dataset metadata are derived directly from these web services. Algorithm configuration and information needed to retrieve data for processing are passed to a server where all large-volume data access and manipulation takes place. The data integration strategy described here was implemented by leveraging existing free and open source software. Details of the software used are omitted; rather, emphasis is placed on how open-standard web services and data encodings can be used in an architecture that integrates common geographic and atmospheric data.
NASA Astrophysics Data System (ADS)
Patton, E. W.; West, P.; Greer, R.; Jin, B.
2011-12-01
Following on work presented at the 2010 AGU Fall Meeting, we present a number of real-world collections of semantically-enabled scientific metadata ingested into the Tetherless World RDF2HTML system as structured data and presented and edited using that system. Two separate datasets from two different domains (oceanography and solar sciences) are made available using existing web standards and services, e.g. encoded using ontologies represented with the Web Ontology Language (OWL) and stored in a SPARQL endpoint for querying. These datasets are deployed for use in three different web environments, i.e. Drupal, MediaWiki, and a custom web portal written in Java, to highlight the cross-platform nature of the data presentation. Stylesheets used to transform concepts in each domain as well as shared terms into HTML will be presented to show the power of using common ontologies to publish data and support reuse of existing terminologies. In addition, a single domain dataset is shared between two separate portal instances to demonstrate the ability for this system to offer distributed access and modification of content across the Internet. Lastly, we will highlight challenges that arose in the software engineering process, outline the design choices we made in solving those issues, and discuss how future improvements to this and other systems will enable the evolution of distributed, decentralized collaborations for scientific data sharing across multiple research groups.
Gorgolewski, Krzysztof J; Auer, Tibor; Calhoun, Vince D; Craddock, R Cameron; Das, Samir; Duff, Eugene P; Flandin, Guillaume; Ghosh, Satrajit S; Glatard, Tristan; Halchenko, Yaroslav O; Handwerker, Daniel A; Hanke, Michael; Keator, David; Li, Xiangrui; Michael, Zachary; Maumet, Camille; Nichols, B Nolan; Nichols, Thomas E; Pellman, John; Poline, Jean-Baptiste; Rokem, Ariel; Schaefer, Gunnar; Sochat, Vanessa; Triplett, William; Turner, Jessica A; Varoquaux, Gaël; Poldrack, Russell A
2016-06-21
The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.
Gorgolewski, Krzysztof J.; Auer, Tibor; Calhoun, Vince D.; Craddock, R. Cameron; Das, Samir; Duff, Eugene P.; Flandin, Guillaume; Ghosh, Satrajit S.; Glatard, Tristan; Halchenko, Yaroslav O.; Handwerker, Daniel A.; Hanke, Michael; Keator, David; Li, Xiangrui; Michael, Zachary; Maumet, Camille; Nichols, B. Nolan; Nichols, Thomas E.; Pellman, John; Poline, Jean-Baptiste; Rokem, Ariel; Schaefer, Gunnar; Sochat, Vanessa; Triplett, William; Turner, Jessica A.; Varoquaux, Gaël; Poldrack, Russell A.
2016-01-01
The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations. PMID:27326542
Serving Fisheries and Ocean Metadata to Communities Around the World
NASA Technical Reports Server (NTRS)
Meaux, Melanie F.
2007-01-01
NASA's Global Change Master Directory (GCMD) assists the oceanographic community in the discovery, access, and sharing of scientific data by serving on-line fisheries and ocean metadata to users around the globe. As of January 2006, the directory holds more than 16,300 Earth Science data descriptions and over 1,300 services descriptions. Of these, nearly 4,000 unique ocean-related metadata records are available to the public, with many having direct links to the data. In 2005, the GCMD averaged over 5 million hits a month, with nearly a half million unique hosts for the year. Through the GCMD portal (http://gcmd.nasa.gov/), users can search vast and growing quantities of data and services using controlled keywords, free-text searches, or a combination of both. Users may now refine a search based on topic, location, instrument, platform, project, data center, spatial and temporal coverage, and data resolution for selected datasets. The directory also offers data holders a means to advertise and search their data through customized portals, which are subset views of the directory. The discovery metadata standard used is the Directory Interchange Format (DIF), adopted in 1988. This format has evolved to accommodate other national and international standards such as FGDC and IS019115. Users can submit metadata through easy-to-use online and offline authoring tools. The directory, which also serves as the International Directory Network (IDN), has been providing its services and sharing its experience and knowledge of metadata at the international, national, regional, and local level for many years. Active partners include the Committee on Earth Observation Satellites (CEOS), federal agencies (such as NASA, NOAA, and USGS), international agencies (such as IOC/IODE, UN, and JAXA) and organizations (such as ESIP, IOOS/DMAC, GOSIC, GLOBEC, OBIS, and GoMODP).
The GEOSS Clearinghouse based on the GeoNetwork opensource
NASA Astrophysics Data System (ADS)
Liu, K.; Yang, C.; Wu, H.; Huang, Q.
2010-12-01
The Global Earth Observation System of Systems (GEOSS) is established to support the study of the Earth system in a global community. It provides services for social management, quick response, academic research, and education. The purpose of GEOSS is to achieve comprehensive, coordinated and sustained observations of the Earth system, improve monitoring of the state of the Earth, increase understanding of Earth processes, and enhance prediction of the behavior of the Earth system. In 2009, GEO called for a competition for an official GEOSS clearinghouse to be selected as a source to consolidating catalogs for Earth observations. The Joint Center for Intelligent Spatial Computing at George Mason University worked with USGS to submit a solution based on the open-source platform - GeoNetwork. In the spring of 2010, the solution is selected as the product for GEOSS clearinghouse. The GEOSS Clearinghouse is a common search facility for the Intergovernmental Group on Ea rth Observation (GEO). By providing a list of harvesting functions in Business Logic, GEOSS clearinghouse can collect metadata from distributed catalogs including other GeoNetwork native nodes, webDAV/sitemap/WAF, catalog services for the web (CSW)2.0, GEOSS Component and Service Registry (http://geossregistries.info/), OGC Web Services (WCS, WFS, WMS and WPS), OAI Protocol for Metadata Harvesting 2.0, ArcSDE Server and Local File System. Metadata in GEOSS clearinghouse are managed in a database (MySQL, Postgresql, Oracle, or MckoiDB) and an index of the metadata is maintained through Lucene engine. Thus, EO data, services, and related resources can be discovered and accessed. It supports a variety of geospatial standards including CSW and SRU for search, FGDC and ISO metadata, and WMS related OGC standards for data access and visualization, as linked from the metadata.
Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals
NASA Astrophysics Data System (ADS)
Zamyadi, A.; Pouliot, J.; Bédard, Y.
2013-09-01
Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial Data Infrastructure (CGDI) metadata which is an implementation of North American Profile of ISO-19115. The comparison analyzes the two metadata against three simulated scenarios about discovering needed 3D geo-spatial datasets. Considering specific metadata about 3D geospatial models, the proposed metadata-set has six additional classes on geometric dimension, level of detail, geometric modeling, topology, and appearance information. In addition classes on data acquisition, preparation, and modeling, and physical availability have been specialized for 3D geospatial models.
A Transparently-Scalable Metadata Service for the Ursa Minor Storage System
2010-06-25
provide application-level guarantees. For example, many document editing programs imple- ment atomic updates by writing the new document ver- sion into a...Transparently-Scalable Metadata Service for the Ursa Minor Storage System 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...operations that could involve multiple servers, how close existing systems come to transparent scala - bility, how systems that handle multi-server
Electronic Health Records Data and Metadata: Challenges for Big Data in the United States.
Sweet, Lauren E; Moulaison, Heather Lea
2013-12-01
This article, written by researchers studying metadata and standards, represents a fresh perspective on the challenges of electronic health records (EHRs) and serves as a primer for big data researchers new to health-related issues. Primarily, we argue for the importance of the systematic adoption of standards in EHR data and metadata as a way of promoting big data research and benefiting patients. EHRs have the potential to include a vast amount of longitudinal health data, and metadata provides the formal structures to govern that data. In the United States, electronic medical records (EMRs) are part of the larger EHR. EHR data is submitted by a variety of clinical data providers and potentially by the patients themselves. Because data input practices are not necessarily standardized, and because of the multiplicity of current standards, basic interoperability in EHRs is hindered. Some of the issues with EHR interoperability stem from the complexities of the data they include, which can be both structured and unstructured. A number of controlled vocabularies are available to data providers. The continuity of care document standard will provide interoperability in the United States between the EMR and the larger EHR, potentially making data input by providers directly available to other providers. The data involved is nonetheless messy. In particular, the use of competing vocabularies such as the Systematized Nomenclature of Medicine-Clinical Terms, MEDCIN, and locally created vocabularies inhibits large-scale interoperability for structured portions of the records, and unstructured portions, although potentially not machine readable, remain essential. Once EMRs for patients are brought together as EHRs, the EHRs must be managed and stored. Adequate documentation should be created and maintained to assure the secure and accurate use of EHR data. There are currently a few notable international standards initiatives for EHRs. Organizations such as Health Level Seven International and Clinical Data Interchange Standards Consortium are developing and overseeing implementation of interoperability standards. Denmark and Singapore are two countries that have successfully implemented national EHR systems. Future work in electronic health information initiatives should underscore the importance of standards and reinforce interoperability of EHRs for big data research and for the sake of patients.
Structure and inference in annotated networks
Newman, M. E. J.; Clauset, Aaron
2016-01-01
For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this ‘metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains. PMID:27306566
Structure and inference in annotated networks
NASA Astrophysics Data System (ADS)
Newman, M. E. J.; Clauset, Aaron
2016-06-01
For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this `metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains.
NASA Technical Reports Server (NTRS)
Golden, Keith; Clancy, Dan (Technical Monitor)
2001-01-01
The data management problem comprises data processing and data tracking. Data processing is the creation of new data based on existing data sources. Data tracking consists of storing metadata descriptions of available data. This paper addresses the data management problem by casting it as an AI planning problem. Actions are data-processing commands, plans are dataflow programs and goals are metadata descriptions of desired data products. Data manipulation is simply plan generation and execution, and a key component of data tracking is inferring the effects of an observed plan. We introduce a new action language for data management domains, called ADILM. We discuss the connection between data processing and information integration and show how a language for the latter must be modified to support the former. The paper also discusses information gathering within a data-processing framework, and show how ADILM metadata expressions are a generalization of Local Completeness.
Operational Interoperability Challenges on the Example of GEOSS and WIS
NASA Astrophysics Data System (ADS)
Heene, M.; Buesselberg, T.; Schroeder, D.; Brotzer, A.; Nativi, S.
2015-12-01
The following poster highlights the operational interoperability challenges on the example of Global Earth Observation System of Systems (GEOSS) and World Meteorological Organization Information System (WIS). At the heart of both systems is a catalogue of earth observation data, products and services but with different metadata management concepts. While in WIS a strong governance with an own metadata profile for the hundreds of thousands metadata records exists, GEOSS adopted a more open approach for the ten million records. Furthermore, the development of WIS - as an operational system - follows a roadmap with committed downwards compatibility while the GEOSS development process is more agile. The poster discusses how the interoperability can be reached for the different metadata management concepts and how a proxy concept helps to couple two different systems which follow a different development methodology. Furthermore, the poster highlights the importance of monitoring and backup concepts as a verification method for operational interoperability.
Modernization of the Caltech/USGS Southern California Seismic Network
NASA Astrophysics Data System (ADS)
Bhadha, R.; Devora, A.; Hauksson, E.; Johnson, D.; Thomas, V.; Watkins, M.; Yip, R.; Yu, E.; Given, D.; Cone, G.; Koesterer, C.
2009-12-01
The USGS/ANSS/ARRA program is providing Government Furnished Equipment (GFE), and two year funding for upgrading the Caltech/USGS Southern California Seismic Network (SCSN). The SCSN is the modern digital ground motion seismic network in southern California that monitors seismicity and provides real-time earthquake information products such as rapid notifications, moment tensors, and ShakeMap. The SCSN has evolved through the years and now consists of several well-integrated components such as Short-Period analog, TERRAscope, digital stations, and real-time strong motion stations, or about 300 stations. In addition, the SCSN records data from about 100 stations provided by partner networks. To strengthen the ability of SCSN to meet the ANSS performance standards, we will install GFE and carry out the following upgrades and improvements of the various components of the SCSN: 1) Upgrade of dataloggers at seven TERRAscope stations; 2) Upgrade of dataloggers at 131 digital stations and upgrade broadband sensors at 25 stations; 3) Upgrade of SCSN metadata capabilities; 4) Upgrade of telemetry capabilities for both seismic and GPS data; and 5) Upgrade balers at stations with existing Q330 dataloggers. These upgrades will enable the SCSN to meet the ANSS Performance Standards more consistently than before. The new equipment will improve station uptimes and reduce maintenance costs. The new equipment will also provide improved waveform data quality and consequently superior data products. The data gaps due to various outages will be minimized, and ‘late’ data will be readily available through retrieval from on-site storage. Compared to the outdated equipment, the new equipment will speed up data delivery by about 10 sec, which is fast enough for earthquake early warning applications. The new equipment also has about a factor of ten lower consumption of power. We will also upgrade the SCSN data acquisition and data center facilities, which will improve the SCSN performance and metadata availability. We will improve existing software to facilitate the update of metadata, and to improve the interoperability between SeisNetWatch and our database of metadata. The improved software will also be made available to other regional networks as part of the CISN software distribution. These upgrades, will greatly improve the robustness of the SCSN, and facilitate higher quality and more reliable earthquake monitoring than was available before in southern California. The modernized SCSN will contribute to more coordinated search and rescue as well as economic resilience following a major earthquake by providing accurate earthquake information, and thus facilitate rapid deployment of field crews and rapid business resumption. Further, advances in seismological research will be facilitated by the high quality seismic data that will be collected in one of the most seismically active areas in the contiguous US.
Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR.
Ulrich, Hannes; Kock, Ann-Kristin; Duhm-Harbeck, Petra; Habermann, Jens K; Ingenerf, Josef
2016-01-01
Unreconciled data structures and formats are a common obstacle to the urgently required sharing and reuse of data within healthcare and medical research. Within the North German Tumor Bank of Colorectal Cancer, clinical and sample data, based on a harmonized data set, is collected and can be pooled by using a hospital-integrated Research Data Management System supporting biobank and study management. Adding further partners who are not using the core data set requires manual adaptations and mapping of data elements. Facing this manual intervention and focusing the reuse of heterogeneous healthcare instance data (value level) and data elements (metadata level), a metadata repository has been developed. The metadata repository is an ISO 11179-3 conformant server application built for annotating and mediating data elements. The implemented architecture includes the translation of metadata information about data elements into the FHIR standard using the FHIR Data Element resource with the ISO 11179 Data Element Extensions. The FHIR-based processing allows exchange of data elements with clinical and research IT systems as well as with other metadata systems. With increasingly annotated and harmonized data elements, data quality and integration can be improved for successfully enabling data analytics and decision support.
Transformation of HDF-EOS metadata from the ECS model to ISO 19115-based XML
NASA Astrophysics Data System (ADS)
Wei, Yaxing; Di, Liping; Zhao, Baohua; Liao, Guangxuan; Chen, Aijun
2007-02-01
Nowadays, geographic data, such as NASA's Earth Observation System (EOS) data, are playing an increasing role in many areas, including academic research, government decisions and even in people's every lives. As the quantity of geographic data becomes increasingly large, a major problem is how to fully make use of such data in a distributed, heterogeneous network environment. In order for a user to effectively discover and retrieve the specific information that is useful, the geographic metadata should be described and managed properly. Fortunately, the emergence of XML and Web Services technologies greatly promotes information distribution across the Internet. The research effort discussed in this paper presents a method and its implementation for transforming Hierarchical Data Format (HDF)-EOS metadata from the NASA ECS model to ISO 19115-based XML, which will be managed by the Open Geospatial Consortium (OGC) Catalogue Services—Web Profile (CSW). Using XML and international standards rather than domain-specific models to describe the metadata of those HDF-EOS data, and further using CSW to manage the metadata, can allow metadata information to be searched and interchanged more widely and easily, thus promoting the sharing of HDF-EOS data.
Building an Internet of Samples: The Australian Contribution
NASA Astrophysics Data System (ADS)
Wyborn, Lesley; Klump, Jens; Bastrakova, Irina; Devaraju, Anusuriya; McInnes, Brent; Cox, Simon; Karssies, Linda; Martin, Julia; Ross, Shawn; Morrissey, John; Fraser, Ryan
2017-04-01
Physical samples are often the ground truth to research reported in the scientific literature across multiple domains. They are collected by many different entities (individual researchers, laboratories, government agencies, mining companies, citizens, museums, etc.). Samples must be curated over the long-term to ensure both that their existence is known, and to allow any data derived from them through laboratory and field tests to be linked to the physical samples. For example, having unique identifiers that link back ground truth data on the original sample helps calibrate large volumes of remotely sensed data. Access to catalogues of reliably identified samples from several collections promotes collaboration across all Earth Science disciplines. It also increases the cost effectiveness of research by reducing the need to re-collect samples in the field. The assignment of web identifiers to the digital representations of these physical objects allows us to link to data, literature, investigators and institutions, thus creating an "Internet of Samples". An Australian implementation of the "Internet of Samples" is using the IGSN (International Geo Sample Number, http://igsn.github.io) to identify samples in a globally unique and persistent way. IGSN was developed in the solid earth science community and is recommended for sample identification by the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS). IGSN is interoperable with other persistent identifier systems such as DataCite. Furthermore, the basic IGSN description metadata schema is compatible with existing schemas such as OGC Observations and Measurements (O&M) and DataCite Metadata Schema which makes crosswalks to other metadata schemas easy. IGSN metadata is disseminated through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) allowing it to be aggregated in other applications such as portals (e.g. the Australian IGSN catalogue http://igsn2.csiro.au). The metadata is available in more than one format. The software for IGSN web services is based on components developed for DataCite and adapted to the specific requirements of IGSN. This cooperation in open source development ensures sustainable implementation and faster turnaround times for updates. IGSN, in particular in its Australian implementation, is characterised by a federated approach to system architecture and organisational governance giving it the necessary flexibility to adapt to particular local practices within multiple domains, whilst maintaining an overarching international standard. The three current IGSN allocation agents in Australia: Geoscience Australia, CSIRO and Curtin University, represent different sectors. Through funding from the Australian Research Data Services Program they have combined to develop a common web portal that allows discovery of physical samples and sample collections at a national level.International governance then ensures we can link to an international community but at the same time act locally to ensure the services offered are relevant to the needs of Australian researchers. This flexibility aids the integration of new disciplines into a global community of a physical samples information network.
Standardized Metadata for Education: A Status Report.
ERIC Educational Resources Information Center
Duval, Erik
This paper starts with a brief background to worldwide standardization activities in the field of educational technologies, and identifies three important accredited standardization organizations in the domain of education and training: the Institute of Electrical and Electronics Engineers (IEEE) Learning Technology Standardization Committee…
Visualizing and Validating Metadata Traceability within the CDISC Standards.
Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine
2017-01-01
The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information.
Visualizing and Validating Metadata Traceability within the CDISC Standards
Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine
2017-01-01
The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information. PMID:28815125
Principles of metadata organization at the ENCODE data coordination center.
Hong, Eurie L; Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Malladi, Venkat S; Strattan, J Seth; Hitz, Benjamin C; Gabdank, Idan; Narayanan, Aditi K; Ho, Marcus; Lee, Brian T; Rowe, Laurence D; Dreszer, Timothy R; Roe, Greg R; Podduturi, Nikhil R; Tanaka, Forrest; Hilton, Jason A; Cherry, J Michael
2016-01-01
The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org. © The Author(s) 2016. Published by Oxford University Press.
OpenFlow arbitrated programmable network channels for managing quantum metadata
Dasari, Venkat R.; Humble, Travis S.
2016-10-10
Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
OpenFlow arbitrated programmable network channels for managing quantum metadata
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dasari, Venkat R.; Humble, Travis S.
Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
NASA Astrophysics Data System (ADS)
Tanner, S.; Schwab, M.; Beam, K.; Skaug, M.
2017-12-01
Operation IceBridge has been flying campaigns in the Arctic and Antarctic for nearly 10 years and will soon be a decadal mission. During that time, the generation and use of file level metadata has evolved from nearly non-existent to robust spatio-temporal support. This evolution has been difficult at times, but the results speak for themselves in the form of production tools for search, discovery, access and analysis. The lessons learned from this experience are now being incorporated into SnowEx, a new mission to measure snow cover using airborne and ground-based measurements. This presentation will focus on techniques for generating metadata for such a diverse set of measurements as well as the resulting tools that utilize this information. This includes the development and deployment of MetGen, a semi-automated metadata generation capability that relies on collaboration between data producers and data archivers, the newly deployed IceBridge data portal which incorporates data browse capabilities and limited in-line analysis, and programmatic access to metadata and data for incorporation into larger automated workflows.
Data management routines for reproducible research using the G-Node Python Client library
Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J.; Garbers, Christian; Rautenberg, Philipp L.; Wachtler, Thomas
2014-01-01
Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow. PMID:24634654
Data management routines for reproducible research using the G-Node Python Client library.
Sobolev, Andrey; Stoewer, Adrian; Pereira, Michael; Kellner, Christian J; Garbers, Christian; Rautenberg, Philipp L; Wachtler, Thomas
2014-01-01
Structured, efficient, and secure storage of experimental data and associated meta-information constitutes one of the most pressing technical challenges in modern neuroscience, and does so particularly in electrophysiology. The German INCF Node aims to provide open-source solutions for this domain that support the scientific data management and analysis workflow, and thus facilitate future data access and reproducible research. G-Node provides a data management system, accessible through an application interface, that is based on a combination of standardized data representation and flexible data annotation to account for the variety of experimental paradigms in electrophysiology. The G-Node Python Library exposes these services to the Python environment, enabling researchers to organize and access their experimental data using their familiar tools while gaining the advantages that a centralized storage entails. The library provides powerful query features, including data slicing and selection by metadata, as well as fine-grained permission control for collaboration and data sharing. Here we demonstrate key actions in working with experimental neuroscience data, such as building a metadata structure, organizing recorded data in datasets, annotating data, or selecting data regions of interest, that can be automated to large degree using the library. Compliant with existing de-facto standards, the G-Node Python Library is compatible with many Python tools in the field of neurophysiology and thus enables seamless integration of data organization into the scientific data workflow.
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Richard, S. M.; Valentine, D. W., Jr.; Grethe, J. S.; Hsu, L.; Malik, T.; Bermudez, L. E.; Gupta, A.; Lehnert, K. A.; Whitenack, T.; Ozyurt, I. B.; Condit, C.; Calderon, R.; Musil, L.
2014-12-01
EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application. This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project's website (http://workspace.earthcube.org/cinergi), which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.
The National Map seamless digital elevation model specifications
Archuleta, Christy-Ann M.; Constance, Eric W.; Arundel, Samantha T.; Lowe, Amanda J.; Mantey, Kimberly S.; Phillips, Lori A.
2017-08-02
This specification documents the requirements and standards used to produce the seamless elevation layers for The National Map of the United States. Seamless elevation data are available for the conterminous United States, Hawaii, Alaska, and the U.S. territories, in three different resolutions—1/3-arc-second, 1-arc-second, and 2-arc-second. These specifications include requirements and standards information about source data requirements, spatial reference system, distribution tiling schemes, horizontal resolution, vertical accuracy, digital elevation model surface treatment, georeferencing, data source and tile dates, distribution and supporting file formats, void areas, metadata, spatial metadata, and quality assurance and control.
Tackling the 2nd V: Big Data, Variety and the Need for Representation Consistency
NASA Astrophysics Data System (ADS)
Clune, T.; Kuo, K. S.
2016-12-01
While Big Data technologies are transforming our ability to analyze ever larger volumes of Earth science data, practical constraints continue to limit our ability to compare data across datasets from different sources in an efficient and robust manner. Within a single data collection, invariants such as file format, grid type, and spatial resolution greatly simplify many types of analysis (often implicitly). However, when analysis combines data across multiple data collections, researchers are generally required to implement data transformations (i.e., "data preparation") to provide appropriate invariants. These transformation include changing of file formats, ingesting into a database, and/or regridding to a common spatial representation, and they can either be performed once, statically, or each time the data is accessed. At the very least, this process is inefficient from the perspective of the community as each team selects its own representation and privately implements the appropriate transformations. No doubt there are disadvantages to any "universal" representation, but we posit that major benefits would be obtained if a suitably flexible spatial representation could be standardized along with tools for transforming to/from that representation. We regard this as part of the historic trend in data publishing. Early datasets used ad hoc formats and lacked metadata. As better tools evolved, published data began to use standardized formats (e.g., HDF and netCDF) with attached metadata. We propose that the modern need to perform analysis across data sets should drive a new generation of tools that support a standardized spatial representation. More specifically, we propose the hierarchical triangular mesh (HTM) as a suitable "generic" resolution that permits standard transformations to/from native representations in use today, as well as tools to convert/regrid existing datasets onto that representation.
NASA Astrophysics Data System (ADS)
Sheldon, W.
2013-12-01
Managing data for a large, multidisciplinary research program such as a Long Term Ecological Research (LTER) site is a significant challenge, but also presents unique opportunities for data stewardship. LTER research is conducted within multiple organizational frameworks (i.e. a specific LTER site as well as the broader LTER network), and addresses both specific goals defined in an NSF proposal as well as broader goals of the network; therefore, every LTER data can be linked to rich contextual information to guide interpretation and comparison. The challenge is how to link the data to this wealth of contextual metadata. At the Georgia Coastal Ecosystems LTER we developed an integrated information management system (GCE-IMS) to manage, archive and distribute data, metadata and other research products as well as manage project logistics, administration and governance (figure 1). This system allows us to store all project information in one place, and provide dynamic links through web applications and services to ensure content is always up to date on the web as well as in data set metadata. The database model supports tracking changes over time in personnel roles, projects and governance decisions, allowing these databases to serve as canonical sources of project history. Storing project information in a central database has also allowed us to standardize both the formatting and content of critical project information, including personnel names, roles, keywords, place names, attribute names, units, and instrumentation, providing consistency and improving data and metadata comparability. Lookup services for these standard terms also simplify data entry in web and database interfaces. We have also coupled the GCE-IMS to our MATLAB- and Python-based data processing tools (i.e. through database connections) to automate metadata generation and packaging of tabular and GIS data products for distribution. Data processing history is automatically tracked throughout the data lifecycle, from initial import through quality control, revision and integration by our data processing system (GCE Data Toolbox for MATLAB), and included in metadata for versioned data products. This high level of automation and system integration has proven very effective in managing the chaos and scalability of our information management program.
Zare-Farashbandi, Firoozeh; Ramezan-Shirazi, Mahtab; Ashrafi-Rizi, Hasan; Nouri, Rasool
2014-01-01
Recent progress in providing innovative solutions in the organization of electronic resources and research in this area shows a global trend in the use of new strategies such as metadata to facilitate description, place for, organization and retrieval of resources in the web environment. In this context, library metadata standards have a special place; therefore, the purpose of the present study has been a comparative study on the Central Libraries' Websites of Iran State Universities for Hyper Text Mark-up Language (HTML) and Dublin Core metadata elements usage in 2011. The method of this study is applied-descriptive and data collection tool is the check lists created by the researchers. Statistical community includes 98 websites of the Iranian State Universities of the Ministry of Health and Medical Education and Ministry of Science, Research and Technology and method of sampling is the census. Information was collected through observation and direct visits to websites and data analysis was prepared by Microsoft Excel software, 2011. The results of this study indicate that none of the websites use Dublin Core (DC) metadata and that only a few of them have used overlaps elements between HTML meta tags and Dublin Core (DC) elements. The percentage of overlaps of DC elements centralization in the Ministry of Health were 56% for both description and keywords and, in the Ministry of Science, were 45% for the keywords and 39% for the description. But, HTML meta tags have moderate presence in both Ministries, as the most-used elements were keywords and description (56%) and the least-used elements were date and formatter (0%). It was observed that the Ministry of Health and Ministry of Science follows the same path for using Dublin Core standard on their websites in the future. Because Central Library Websites are an example of scientific web pages, special attention in designing them can help the researchers to achieve faster and more accurate information resources. Therefore, the influence of librarians' ideas on the awareness of web designers and developers will be important for using metadata elements as general, and specifically for applying such standards.
Zare-Farashbandi, Firoozeh; Ramezan-Shirazi, Mahtab; Ashrafi-Rizi, Hasan; Nouri, Rasool
2014-01-01
Introduction: Recent progress in providing innovative solutions in the organization of electronic resources and research in this area shows a global trend in the use of new strategies such as metadata to facilitate description, place for, organization and retrieval of resources in the web environment. In this context, library metadata standards have a special place; therefore, the purpose of the present study has been a comparative study on the Central Libraries’ Websites of Iran State Universities for Hyper Text Mark-up Language (HTML) and Dublin Core metadata elements usage in 2011. Materials and Methods: The method of this study is applied-descriptive and data collection tool is the check lists created by the researchers. Statistical community includes 98 websites of the Iranian State Universities of the Ministry of Health and Medical Education and Ministry of Science, Research and Technology and method of sampling is the census. Information was collected through observation and direct visits to websites and data analysis was prepared by Microsoft Excel software, 2011. Results: The results of this study indicate that none of the websites use Dublin Core (DC) metadata and that only a few of them have used overlaps elements between HTML meta tags and Dublin Core (DC) elements. The percentage of overlaps of DC elements centralization in the Ministry of Health were 56% for both description and keywords and, in the Ministry of Science, were 45% for the keywords and 39% for the description. But, HTML meta tags have moderate presence in both Ministries, as the most-used elements were keywords and description (56%) and the least-used elements were date and formatter (0%). Conclusion: It was observed that the Ministry of Health and Ministry of Science follows the same path for using Dublin Core standard on their websites in the future. Because Central Library Websites are an example of scientific web pages, special attention in designing them can help the researchers to achieve faster and more accurate information resources. Therefore, the influence of librarians’ ideas on the awareness of web designers and developers will be important for using metadata elements as general, and specifically for applying such standards. PMID:24741646
Bridging the gap between Hydrologic and Atmospheric communities through a standard based framework
NASA Astrophysics Data System (ADS)
Boldrini, E.; Salas, F.; Maidment, D. R.; Mazzetti, P.; Santoro, M.; Nativi, S.; Domenico, B.
2012-04-01
Data interoperability in the study of Earth sciences is essential to performing interdisciplinary multi-scale multi-dimensional analyses (e.g. hydrologic impacts of global warming, regional urbanization, global population growth etc.). This research aims to bridge the existing gap between hydrologic and atmospheric communities both at semantic and technological levels. Within the context of hydrology, scientists are usually concerned with data organized as time series: a time series can be seen as a variable measured at a particular point in space over a period of time (e.g. the stream flow values as periodically measured by a buoy sensor in a river); atmospheric scientists instead usually organize their data as coverages: a coverage can be seen as a multidimensional data array (e.g. satellite images acquired through time). These differences make non-trivial the set up of a common framework to perform data discovery and access. A set of web services specifications and implementations is already in place in both the scientific communities to allow data discovery and access in the different domains. The CUAHSI-Hydrologic Information System (HIS) service stack lists different services types and implementations: - a metacatalog (implemented as a CSW) used to discover metadata services by distributing the query to a set of catalogs - time series catalogs (implemented as CSW) used to discover datasets published by the feature services - feature services (implemented as WFS) containing features with data access link - sensor observation services (implemented as SOS) enabling access to the stream of acquisitions Within the Unidata framework, there lies a similar service stack for atmospheric data: - the broker service (implemented as a CSW) distributes a user query to a set of heterogeneous services (i.e. catalogs services, but also inventory and access services) - the catalog service (implemented as a CSW) is able to harvest the available metadata offered by THREDDS services, and executes complex queries against the available metadata. - inventory service (implemented as a THREDDS) being able to hierarchically organize and publish a local collection of multi-dimensional arrays (e.g. NetCDF, GRIB files), as well as publish auxiliary standard services to realize the actual data access and visualization (e.g. WCS, OPeNDAP, WMS). The approach followed in this research is to build on top of the existing standards and implementations, by setting up a standard-aware interoperable framework, able to deal with the existing heterogeneity in an organic way. As a methodology, interoperability tests against real services were performed; existing problems were thus highlighted and possibly solved. The use of flexible tools, able to deal in a smart way with heterogeneity has proven to be successful, in particular experiments were carried on with both GI-cat broker and ESRI GeoPortal frameworks. GI-cat discovery broker was proven successful at implementing the CSW interface, as well as federating heterogeneous resources, such as THREDDS and WCS services published by Unidata, HydroServer, WFS and SOS services published by CUAHSI. Experiments with ESRI GeoPortal were also successful: the GeoPortal was used to deploy a web interface able to distribute searches amongst catalog implementations from both the hydrologic and the atmospheric communities, including HydroServers and GI-cat, combining results from both the domains in a seamless way.
ODISEES: A New Paradigm in Data Access
NASA Astrophysics Data System (ADS)
Huffer, E.; Little, M. M.; Kusterer, J.
2013-12-01
As part of its ongoing efforts to improve access to data, the Atmospheric Science Data Center has developed a high-precision Earth Science domain ontology (the 'ES Ontology') implemented in a graph database ('the Semantic Metadata Repository') that is used to store detailed, semantically-enhanced, parameter-level metadata for ASDC data products. The ES Ontology provides the semantic infrastructure needed to drive the ASDC's Ontology-Driven Interactive Search Environment for Earth Science ('ODISEES'), a data discovery and access tool, and will support additional data services such as analytics and visualization. The ES ontology is designed on the premise that naming conventions alone are not adequate to provide the information needed by prospective data consumers to assess the suitability of a given dataset for their research requirements; nor are current metadata conventions adequate to support seamless machine-to-machine interactions between file servers and end-user applications. Data consumers need information not only about what two data elements have in common, but also about how they are different. End-user applications need consistent, detailed metadata to support real-time data interoperability. The ES ontology is a highly precise, bottom-up, queriable model of the Earth Science domain that focuses on critical details about the measurable phenomena, instrument techniques, data processing methods, and data file structures. Earth Science parameters are described in detail in the ES Ontology and mapped to the corresponding variables that occur in ASDC datasets. Variables are in turn mapped to well-annotated representations of the datasets that they occur in, the instrument(s) used to create them, the instrument platforms, the processing methods, etc., creating a linked-data structure that allows both human and machine users to access a wealth of information critical to understanding and manipulating the data. The mappings are recorded in the Semantic Metadata Repository as RDF-triples. An off-the-shelf Ontology Development Environment and a custom Metadata Conversion Tool comprise a human-machine/machine-machine hybrid tool that partially automates the creation of metadata as RDF-triples by interfacing with existing metadata repositories and providing a user interface that solicits input from a human user, when needed. RDF-triples are pushed to the Ontology Development Environment, where a reasoning engine executes a series of inference rules whose antecedent conditions can be satisfied by the initial set of RDF-triples, thereby generating the additional detailed metadata that is missing in existing repositories. A SPARQL Endpoint, a web-based query service and a Graphical User Interface allow prospective data consumers - even those with no familiarity with NASA data products - to search the metadata repository to find and order data products that meet their exact specifications. A web-based API will provide an interface for machine-to-machine transactions.
SIPSMetGen: It's Not Just For Aircraft Data and ECS Anymore.
NASA Astrophysics Data System (ADS)
Schwab, M.
2015-12-01
The SIPSMetGen utility, developed for the NASA EOSDIS project, under the EED contract, simplified the creation of file level metadata for the ECS System. The utility has been enhanced for ease of use, efficiency, speed and increased flexibility. The SIPSMetGen utility was originally created as a means of generating file level spatial metadata for Operation IceBridge. The first version created only ODL metadata, specific for ingest into ECS. The core strength of the utility was, and continues to be, its ability to take complex shapes and patterns of data collection point clouds from aircraft flights and simplify them to a relatively simple concave hull geo-polygon. It has been found to be a useful and easy to use tool for creating file level metadata for many other missions, both aircraft and satellite. While the original version was useful it had its limitations. In 2014 Raytheon was tasked to make enhancements to SIPSMetGen, this resulted a new version of SIPSMetGen which can create ISO Compliant XML metadata; provides optimization and streamlining of the algorithm for creating the spatial metadata; a quicker runtime with more consistent results; a utility that can be configured to run multi-threaded on systems with multiple processors. The utility comes with a java based graphical user interface to aid in configuration and running of the utility. The enhanced SIPSMetGen allows more diverse data sets to be archived with file level metadata. The advantage of archiving data with file level metadata is that it makes it easier for data users, and scientists to find relevant data. File level metadata unlocks the power of existing archives and metadata repositories such as ECS and CMR and search and discovery utilities like Reverb and Earth Data Search. Current missions now using SIPSMetGen include: Aquarius, Measures, ARISE, and Nimbus.
DOT National Transportation Integrated Search
1997-07-14
These standards represent a guideline for preparing digital data for inclusion in the National Pipeline Mapping System Repository. The standards were created with input from the pipeline industry and government agencies. They address the submission o...
NASA Astrophysics Data System (ADS)
Yatagai, A. I.; Iyemori, T.; Ritschel, B.; Koyama, Y.; Hori, T.; Abe, S.; Tanaka, Y.; Shinbori, A.; Umemura, N.; Sato, Y.; Yagi, M.; Ueno, S.; Hashiguchi, N. O.; Kaneda, N.; Belehaki, A.; Hapgood, M. A.
2013-12-01
The IUGONET is a Japanese program to build a metadata database for ground-based observations of the upper atmosphere [1]. The project began in 2009 with five Japanese institutions which archive data observed by radars, magnetometers, photometers, radio telescopes and helioscopes, and so on, at various altitudes from the Earth's surface to the Sun. Systems have been developed to allow searching of the above described metadata. We have been updating the system and adding new and updated metadata. The IUGONET development team adopted the SPASE metadata model [2] to describe the upper atmosphere data. This model is used as the common metadata format by the virtual observatories for solar-terrestrial physics. It includes metadata referring to each data file (called a 'Granule'), which enable a search for data files as well as data sets. Further details are described in [2] and [3]. Currently, three additional Japanese institutions are being incorporated in IUGONET. Furthermore, metadata of observations of the troposphere, taken at the observatories of the middle and upper atmosphere radar at Shigaraki and the Meteor radar in Indonesia, have been incorporated. These additions will contribute to efficient interdisciplinary scientific research. In the beginning of 2013, the registration of the 'Observatory' and 'Instrument' metadata was completed, which makes it easy to overview of the metadata database. The number of registered metadata as of the end of July, totalled 8.8 million, including 793 observatories and 878 instruments. It is important to promote interoperability and/or metadata exchange between the database development groups. A memorandum of agreement has been signed with the European Near-Earth Space Data Infrastructure for e-Science (ESPAS) project, which has similar objectives to IUGONET with regard to a framework for formal collaboration. Furthermore, observations by satellites and the International Space Station are being incorporated with a view for making/linking metadata databases. The development of effective data systems will contribute to the progress of scientific research on solar terrestrial physics, climate and the geophysical environment. Any kind of cooperation, metadata input and feedback, especially for linkage of the databases, is welcomed. References 1. Hayashi, H. et al., Inter-university Upper Atmosphere Global Observation Network (IUGONET), Data Sci. J., 12, WDS179-184, 2013. 2. King, T. et al., SPASE 2.0: A standard data model for space physics. Earth Sci. Inform. 3, 67-73, 2010, doi:10.1007/s12145-010-0053-4. 3. Hori, T., et al., Development of IUGONET metadata format and metadata management system. J. Space Sci. Info. Jpn., 105-111, 2012. (in Japanese)
NASA Astrophysics Data System (ADS)
Moore, J.; Serreze, M. C.; Middleton, D.; Ramamurthy, M. K.; Yarmey, L.
2013-12-01
The NSF funds the Advanced Cooperative Arctic Data and Information System (ACADIS), url: (http://www.aoncadis.org/). It serves the growing and increasingly diverse data management needs of NSF's arctic research community. The ACADIS investigator team combines experienced data managers, curators and software engineers from the NSIDC, UCAR and NCAR. ACADIS fosters scientific synthesis and discovery by providing a secure long-term data archive to NSF investigators. The system provides discovery and access to arctic related data from this and other archives. This paper updates the technical components of ACADIS, the implementation of best practices, the value of ACADIS to the community and the major challenges facing this archive for the future in handling the diverse data coming from NSF Arctic investigators. ACADIS provides sustainable data management, data stewardship services and leadership for the NSF Arctic research community through open data sharing, adherence to best practices and standards, capitalizing on appropriate evolving technologies, community support and engagement. ACADIS leverages other pertinent projects, capitalizing on appropriate emerging technologies and participating in emerging cyberinfrastructure initiatives. The key elements of ACADIS user services to the NSF Arctic community include: data and metadata upload; support for datasets with special requirements; metadata and documentation generation; interoperability and initiatives with other archives; and science support to investigators and the community. Providing a self-service data publishing platform requiring minimal curation oversight while maintaining rich metadata for discovery, access and preservation is challenging. Implementing metadata standards are a first step towards consistent content. The ACADIS Gateway and ADE offer users choices for data discovery and access with the clear objective of increasing discovery and use of all Arctic data especially for analysis activities. Metadata is at the core of ACADIS activities, from capturing metadata at the point of data submission to ensuring interoperability , providing data citations, and supporting data discovery. ACADIS metadata efforts include: 1) Evolution of the ACADIS metadata profile to increase flexibility in search; 2) Documentation guidelines; and 3) Metadata standardization efforts. A major activity is now underway to ensure consistency in the metadata profile across all archived datasets. ACADIS is embarking on a critical activity to create Digital Object Identifiers (DOI) for all its holdings. The data services offered by ACADIS focus on meeting the needs of the data providers, providing dynamic search capabilities to peruse the ACADIS and related cyrospheric data repositories, efficient data download and some special services including dataset reformatting and visualization. The service is built around of the following key technical elements: The ACADIS Gateway housed at NCAR has been developed to support NSF Arctic data coming from AON and now broadly across PLR/ARC and related archives: The Arctic Data Explorer (ADE) developed at NSIDC is an integral service of ACADIS bringing the rich archive from NSIDC together with catalogs from ACADIS and international partners in Arctic research: and Rosetta and the Digital Object Identifier (DOI) generation scheme are tools available to the community to help publish and utilize datasets in integration and synthesis and publication.
Exposing Coverage Data to the Semantic Web within the MELODIES project: Challenges and Solutions
NASA Astrophysics Data System (ADS)
Riechert, Maik; Blower, Jon; Griffiths, Guy
2016-04-01
Coverage data, typically big in data volume, assigns values to a given set of spatiotemporal positions, together with metadata on how to interpret those values. Existing storage formats like netCDF, HDF and GeoTIFF all have various restrictions that prevent them from being preferred formats for use over the web, especially the semantic web. Factors that are relevant here are the processing complexity, the semantic richness of the metadata, and the ability to request partial information, such as a subset or just the appropriate metadata. Making coverage data available within web browsers opens the door to new ways for working with such data, including new types of visualization and on-the-fly processing. As part of the European project MELODIES (http://melodiesproject.eu) we look into the challenges of exposing such coverage data in an interoperable and web-friendly way, and propose solutions using a host of emerging technologies like JSON-LD, the DCAT and GeoDCAT-AP ontologies, the CoverageJSON format, and new approaches to REST APIs for coverage data. We developed the CoverageJSON format within the MELODIES project as an additional way to expose coverage data to the web, next to having simple rendered images available using standards like OGC's WMS. CoverageJSON partially incorporates JSON-LD but does not encode individual data values as semantic resources, making use of the technology in a practical manner. The development also focused on it being a potential output format for OGC WCS. We will demonstrate how existing netCDF data can be exposed as CoverageJSON resources on the web together with a REST API that allows users to explore the data and run operations such as spatiotemporal subsetting. We will show various use cases from the MELODIES project, including reclassification of a Land Cover dataset client-side within the browser with the ability for the user to influence the reclassification result by making use of the above technologies.
NASA Astrophysics Data System (ADS)
Zaslavsky, I.; Valentine, D.; Richard, S. M.; Gupta, A.; Meier, O.; Peucker-Ehrenbrink, B.; Hudman, G.; Stocks, K. I.; Hsu, L.; Whitenack, T.; Grethe, J. S.; Ozyurt, I. B.
2017-12-01
EarthCube Data Discovery Hub (DDH) is an EarthCube Building Block project using technologies developed in CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) to enable geoscience users to explore a growing portfolio of EarthCube-created and other geoscience-related resources. Over 1 million metadata records are available for discovery through the project portal (cinergi.sdsc.edu). These records are retrieved from data facilities, including federal, state and academic sources, or contributed by geoscientists through workshops, surveys, or other channels. CINERGI metadata augmentation pipeline components 1) provide semantic enhancement based on a large ontology of geoscience terms, using text analytics to generate keywords with references to ontology classes, 2) add spatial extents based on place names found in the metadata record, and 3) add organization identifiers to the metadata. The records are indexed and can be searched via a web portal and standard search APIs. The added metadata content improves discoverability and interoperability of the registered resources. Specifically, the addition of ontology-anchored keywords enables faceted browsing and lets users navigate to datasets related by variables measured, equipment used, science domain, processes described, geospatial features studied, and other dataset characteristics that are generated by the pipeline. DDH also lets data curators access and edit the automatically generated metadata records using the CINERGI metadata editor, accept or reject the enhanced metadata content, and consider it in updating their metadata descriptions. We consider several complex data discovery workflows, in environmental seismology (quantifying sediment and water fluxes using seismic data), marine biology (determining available temperature, location, weather and bleaching characteristics of coral reefs related to measurements in a given coral reef survey), and river geochemistry (discovering observations relevant to geochemical measurements outside the tidal zone, given specific discharge conditions).
NASA Astrophysics Data System (ADS)
Stone, N.; Lafuente, B.; Bristow, T.; Keller, R.; Downs, R. T.; Blake, D. F.; Fonda, M.; Pires, A.
2016-12-01
Working primarily with astrobiology researchers at NASA Ames, the Open Data Repository (ODR) has been conducting a software pilot to meet the varying needs of this multidisciplinary community. Astrobiology researchers often have small communities or operate individually with unique data sets that don't easily fit into existing database structures. The ODR constructed its Data Publisher software to allow researchers to create databases with common metadata structures and subsequently extend them to meet their individual needs and data requirements. The software accomplishes these tasks through a web-based interface that allows collaborative creation and revision of common metadata templates and individual extensions to these templates for custom data sets. This allows researchers to search disparate datasets based on common metadata established through the metadata tools, but still facilitates distinct analyses and data that may be stored alongside the required common metadata. The software produces web pages that can be made publicly available at the researcher's discretion so that users may search and browse the data in an effort to make interoperability and data discovery a human-friendly task while also providing semantic data for machine-based discovery. Once relevant data has been identified, researchers can utilize the built-in application programming interface (API) that exposes the data for machine-based consumption and integration with existing data analysis tools (e.g. R, MATLAB, Project Jupyter - http://jupyter.org). The current evolution of the project has created the Astrobiology Habitable Environments Database (AHED)[1] which provides an interface to databases connected through a common metadata core. In the next project phase, the goal is for small research teams and groups to be self-sufficient in publishing their research data to meet funding mandates and academic requirements as well as fostering increased data discovery and interoperability through human-readable and machine-readable interfaces. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, MSL. [1] B. Lafuente et al. (2016) AGU, submitted.
Assessing Data Quality in Emergent Domains of Earth Sciences
NASA Astrophysics Data System (ADS)
Darch, P. T.; Borgman, C.
2016-12-01
As earth scientists seek to study known phenomena in new ways, and to study new phenomena, they often develop new technologies and new methods such as embedded network sensing, or reapply extant technologies, such as seafloor drilling. Emergent domains are often highly multidisciplinary as researchers from many backgrounds converge on new research questions. They may adapt existing methods, or develop methods de novo. As a result, emerging domains tend to be methodologically heterogeneous. As these domains mature, pressure to standardize methods increases. Standardization promotes trust, reliability, accuracy, and reproducibility, and simplifies data management. However, for standardization to occur, researchers must be able to assess which of the competing methods produces the highest quality data. The exploratory nature of emerging domains discourages standardization. Because competing methods originate in different disciplinary backgrounds, their scientific credibility is difficult to compare. Instead of direct comparison, researchers attempt to conduct meta-analyses. Scientists compare datasets produced by different methods to assess their consistency and efficiency. This paper presents findings from a long-term qualitative case study of research on the deep subseafloor biosphere, an emergent domain. A diverse community converged on the study of microbes in the seafloor and those microbes' interactions with the physical environments they inhabit. Data on this problem are scarce, leading to calls for standardization as a means to acquire and analyze greater volumes of data. Lacking consistent methods, scientists attempted to conduct meta-analyses to determine the most promising methods on which to standardize. Among the factors that inhibited meta-analyses were disparate approaches to metadata and to curating data. Datasets may be deposited in a variety of databases or kept on individual scientists' servers. Associated metadata may be inconsistent or hard to interpret. Incentive structures, including prospects for journal publication, often favor new data over reanalyzing extant datasets. Assessing data quality in emergent domains is extremely difficult and will require adaptations in infrastructure, culture, and incentives.
Leveraging Open Standards and Technologies to Search and Display Planetary Image Data
NASA Astrophysics Data System (ADS)
Rose, M.; Schauer, C.; Quinol, M.; Trimble, J.
2011-12-01
Mars and the Moon have both been visited by multiple NASA spacecraft. A large number of images and other data have been gathered by the spacecraft and are publicly available in NASA's Planetary Data System. Through a collaboration with Google, Inc., the User Centered Technologies group at NASA Ames Resarch Center has developed at tool for searching and browsing among images from multiple Mars and Moon missions. Development of this tool was facilitated by the use of several open technologies and standards. First, an open-source full-text search engine is used to search both place names on the target and to find images matching a geographic region. Second, the published API of the Google Earth browser plugin is used to geolocate the images on a virtual globe and allow the user to navigate on the globe to see related images. The structure of the application also employs standard protocols and services. The back-end is exposed as RESTful APIs, which could be reused by other client systems in the future. Further, the communication between the front- and back-end portions of the system utilizes open data standards including XML and KML (Keyhole Markup Language) for representation of textual and geographic data. The creation of the search index was facilitated by reuse of existing, publicly available metadata, including the Gazetteer of Planetary Nomenclature from the USGS, available in KML format. And the image metadata was reused from standards-compliant archives in the Planetary Data System. The system also supports collaboration with other tools by allowing export of search results in KML, and the ability to display those results in the Google Earth desktop application. We will demonstrate the search and visualization capabilities of the system, with emphasis on how the system facilitates reuse of data and services through the adoption of open standards.
Operational Support for Instrument Stability through ODI-PPA Metadata Visualization and Analysis
NASA Astrophysics Data System (ADS)
Young, M. D.; Hayashi, S.; Gopu, A.; Kotulla, R.; Harbeck, D.; Liu, W.
2015-09-01
Over long time scales, quality assurance metrics taken from calibration and calibrated data products can aid observatory operations in quantifying the performance and stability of the instrument, and identify potential areas of concern or guide troubleshooting and engineering efforts. Such methods traditionally require manual SQL entries, assuming the requisite metadata has even been ingested into a database. With the ODI-PPA system, QA metadata has been harvested and indexed for all data products produced over the life of the instrument. In this paper we will describe how, utilizing the industry standard Highcharts Javascript charting package with a customized AngularJS-driven user interface, we have made the process of visualizing the long-term behavior of these QA metadata simple and easily replicated. Operators can easily craft a custom query using the powerful and flexible ODI-PPA search interface and visualize the associated metadata in a variety of ways. These customized visualizations can be bookmarked, shared, or embedded externally, and will be dynamically updated as new data products enter the system, enabling operators to monitor the long-term health of their instrument with ease.
A future Outlook: Web based Simulation of Hydrodynamic models
NASA Astrophysics Data System (ADS)
Islam, A. S.; Piasecki, M.
2003-12-01
Despite recent advances to present simulation results as 3D graphs or animation contours, the modeling user community still faces some shortcomings when trying to move around and analyze data. Typical problems include the lack of common platforms with standard vocabulary to exchange simulation results from different numerical models, insufficient descriptions about data (metadata), lack of robust search and retrieval tools for data, and difficulties to reuse simulation domain knowledge. This research demonstrates how to create a shared simulation domain in the WWW and run a number of models through multi-user interfaces. Firstly, meta-datasets have been developed to describe hydrodynamic model data based on geographic metadata standard (ISO 19115) that has been extended to satisfy the need of the hydrodynamic modeling community. The Extended Markup Language (XML) is used to publish this metadata by the Resource Description Framework (RDF). Specific domain ontology for Web Based Simulation (WBS) has been developed to explicitly define vocabulary for the knowledge based simulation system. Subsequently, this knowledge based system is converted into an object model using Meta Object Family (MOF). The knowledge based system acts as a Meta model for the object oriented system, which aids in reusing the domain knowledge. Specific simulation software has been developed based on the object oriented model. Finally, all model data is stored in an object relational database. Database back-ends help store, retrieve and query information efficiently. This research uses open source software and technology such as Java Servlet and JSP, Apache web server, Tomcat Servlet Engine, PostgresSQL databases, Protégé ontology editor, RDQL and RQL for querying RDF in semantic level, Jena Java API for RDF. Also, we use international standards such as the ISO 19115 metadata standard, and specifications such as XML, RDF, OWL, XMI, and UML. The final web based simulation product is deployed as Web Archive (WAR) files which is platform and OS independent and can be used by Windows, UNIX, or Linux. Keywords: Apache, ISO 19115, Java Servlet, Jena, JSP, Metadata, MOF, Linux, Ontology, OWL, PostgresSQL, Protégé, RDF, RDQL, RQL, Tomcat, UML, UNIX, Windows, WAR, XML
NASA Astrophysics Data System (ADS)
Thomas, R.; Connell, D.; Spears, T.; Leadbetter, A.; Burger, E. F.
2016-12-01
The scientific literature heavily features small-scale studies with the impact of the results extrapolated to regional/global importance. There are on-going initiatives (e.g. OA-ICC, GOA-ON, GEOTRACES, EMODNet Chemistry) aiming to assemble regional to global-scale datasets that are available for trend or meta-analyses. Assessing the quality and comparability of these data requires information about the processing chain from "sampling to spreadsheet". This provenance information needs to be captured and readily available to assess data fitness for purpose. The NOAA Ocean Acidification metadata template was designed in consultation with domain experts for this reason; the core carbonate chemistry variables have 23-37 metadata fields each and for scientists generating these datasets there could appear to be an ever increasing amount of metadata expected to accompany a dataset. While this provenance metadata should be considered essential by those generating or using the data, for those discovering data there is a sliding scale between what is considered discovery metadata (title, abstract, contacts, etc.) versus usage metadata (methodology, environmental setup, lineage, etc.), the split depending on the intended use of data. As part of the OA-ICC's activities, the metadata fields from the NOAA template relevant to the sample processing chain and QA criteria have been factored to develop profiles for, and extensions to, the OM-JSON encoding supported by the PROV ontology. While this work started focused on carbonate chemistry variable specific metadata, the factorization could be applied within the O&M model across other disciplines such as trace metals or contaminants. In a linked data world with a suitable high level model for sample processing and QA available, tools and support can be provided to link reproducible units of metadata (e.g. the standard protocol for a variable as adopted by a community) and simplify the provision of metadata and subsequent discovery.
Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.
2015-01-01
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402
Streamlining Metadata and Data Management for Evolving Digital Libraries
NASA Astrophysics Data System (ADS)
Clark, D.; Miller, S. P.; Peckman, U.; Smith, J.; Aerni, S.; Helly, J.; Sutton, D.; Chase, A.
2003-12-01
What began two years ago as an effort to stabilize the Scripps Institution of Oceanography (SIO) data archives from more than 700 cruises going back 50 years, has now become the operational fully-searchable "SIOExplorer" digital library, complete with thousands of historic photographs, images, maps, full text documents, binary data files, and 3D visualization experiences, totaling nearly 2 terabytes of digital content. Coping with data diversity and complexity has proven to be more challenging than dealing with large volumes of digital data. SIOExplorer has been built with scalability in mind, so that the addition of new data types and entire new collections may be accomplished with ease. It is a federated system, currently interoperating with three independent data-publishing authorities, each responsible for their own quality control, metadata specifications, and content selection. The IT architecture implemented at the San Diego Supercomputer Center (SDSC) streamlines the integration of additional projects in other disciplines with a suite of metadata management and collection building tools for "arbitrary digital objects." Metadata are automatically harvested from data files into domain-specific metadata blocks, and mapped into various specification standards as needed. Metadata can be browsed and objects can be viewed onscreen or downloaded for further analysis, with automatic proprietary-hold request management.
NASA Astrophysics Data System (ADS)
Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.
2015-12-01
Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.
Migration of the ATLAS Metadata Interface (AMI) to Web 2.0 and cloud
NASA Astrophysics Data System (ADS)
Odier, J.; Albrand, S.; Fulachier, J.; Lambert, F.
2015-12-01
The ATLAS Metadata Interface (AMI), a mature application of more than 10 years of existence, is currently under adaptation to some recently available technologies. The web interfaces, which previously manipulated XML documents using XSL transformations, are being migrated to Asynchronous JavaScript (AJAX). Web development is considerably simplified by the introduction of a framework based on JQuery and Twitter Bootstrap. Finally, the AMI services are being migrated to an OpenStack cloud infrastructure.
ESO Advanced Data Products for the Virtual Observatory
NASA Astrophysics Data System (ADS)
Retzlaff, J.; Delmotte, N.; Rite, C.; Rosati, P.; Slijkhuis, R.; Vandame, B.
2006-07-01
Advanced Data Products, that is, completely reduced, fully characterized science-ready data sets, play a crucial role for the success of the Virtual Observatory as a whole. We report on on-going work at ESO towards the creation and publication of Advanced Data Products in compliance with present VO standards on resource metadata. The new deep NIR multi-color mosaic of the GOODS/CDF-S region is used to showcase different aspects of the entire process: data reduction employing our MVM-based reduction pipeline, calibration and data characterization procedures, standardization of metadata content, and, finally, a prospect of the scientific potential illustrated by new results on deep galaxy number counts.
CF Metadata Conventions: Founding Principles, Governance, and Future Directions
NASA Astrophysics Data System (ADS)
Taylor, K. E.
2016-12-01
The CF Metadata Conventions define attributes that promote sharing of climate and forecasting data and facilitate automated processing by computers. The development, maintenance, and evolution of the conventions have mainly been provided by voluntary community contributions. Nevertheless, an organizational framework has been established, which relies on established rules and web-based discussion to ensure smooth (but relatively efficient) evolution of the standard to accommodate new types of data. The CF standard has been essential to the success of high-profile internationally-coordinated modeling activities (e.g, the Coupled Model Intercomparison Project). A summary of CF's founding principles and the prospects for its future evolution will be discussed.
NASA Astrophysics Data System (ADS)
Budden, A. E.; Arzayus, K. M.; Baker-Yeboah, S.; Casey, K. S.; Dozier, J.; Jones, C. S.; Jones, M. B.; Schildhauer, M.; Walker, L.
2016-12-01
The newly established NSF Arctic Data Center plays a critical support role in archiving and curating the data and software generated by Arctic researchers from diverse disciplines. The Arctic community, comprising Earth science, archaeology, geography, anthropology, and other social science researchers, are supported through data curation services and domain agnostic tools and infrastructure, ensuring data are accessible in the most transparent and usable way possible. This interoperability across diverse disciplines within the Arctic community facilitates collaborative research and is mirrored by interoperability between the Arctic Data Center infrastructure and other large scale cyberinfrastructure initiatives. The Arctic Data Center leverages the DataONE federation to standardize access to and replication of data and metadata to other repositories, specifically the NOAA's National Centers for Environmental Information (NCEI). This approach promotes long-term preservation of the data and metadata, as well as opening the door for other data repositories to leverage this replication infrastructure with NCEI and other DataONE member repositories. The Arctic Data Center uses rich, detailed metadata following widely recognized standards. Particularly, measurement-level and provenance metadata provide scientists the details necessary to integrate datasets across studies and across repositories while enabling a full understanding of the provenance of data used in the system. The Arctic Data Center gains this deep metadata and provenance support by simply adopting DataONE services, which results in significant efficiency gains by eliminating the need to develop systems de novo. Similarly, the advanced search tool developed by the Knowledge Network for Biocomplexity and extended for data submission by the Arctic Data Center, can be used by other DataONE-compliant repositories without further development. By standardizing interfaces and leveraging the DataONE federation, the Arctic Data Center has advanced rapidly and can itself contribute to raising the capabilities of all members of the federation.
Progress Report on the Airborne Metadata and Time Series Working Groups of the 2016 ESDSWG
NASA Astrophysics Data System (ADS)
Evans, K. D.; Northup, E. A.; Chen, G.; Conover, H.; Ames, D. P.; Teng, W. L.; Olding, S. W.; Krotkov, N. A.
2016-12-01
NASA's Earth Science Data Systems Working Groups (ESDSWG) was created over 10 years ago. The role of the ESDSWG is to make recommendations relevant to NASA's Earth science data systems from users' experiences. Each group works independently focusing on a unique topic. Participation in ESDSWG groups comes from a variety of NASA-funded science and technology projects, including MEaSUREs and ROSS. Participants include NASA information technology experts, affiliated contractor staff and other interested community members from academia and industry. Recommendations from the ESDSWG groups will enhance NASA's efforts to develop long term data products. The Airborne Metadata Working Group is evaluating the suitability of the current Common Metadata Repository (CMR) and Unified Metadata Model (UMM) for airborne data sets and to develop new recommendations as necessary. The overarching goal is to enhance the usability, interoperability, discovery and distribution of airborne observational data sets. This will be done by assessing the suitability (gaps) of the current UMM model for airborne data using lessons learned from current and past field campaigns, listening to user needs and community recommendations and assessing the suitability of ISO metadata and other standards to fill the gaps. The Time Series Working Group (TSWG) is a continuation of the 2015 Time Series/WaterML2 Working Group. The TSWG is using a case study-driven approach to test the new Open Geospatial Consortium (OGC) TimeseriesML standard to determine any deficiencies with respect to its ability to fully describe and encode NASA earth observation-derived time series data. To do this, the time series working group is engaging with the OGC TimeseriesML Standards Working Group (SWG) regarding unsatisfied needs and possible solutions. The effort will end with the drafting of an OGC Engineering Report based on the use cases and interactions with the OGC TimeseriesML SWG. Progress towards finalizing recommendations will be presented at the meeting.
Chandler, A.; Foley, D.; Hafez, A.M.
2000-01-01
The purpose of this article is to raise and address a number of issues related to the conversion of Federal Geographic Data Committee metadata into MARC21 and Dublin Core. We present an analysis of 466 FGDC metadata records housed in the National Biological Information Infrastructure (NBII) node of the FGDC Clearinghouse, with special emphasis on the length of fields and the total length of records in this set. One of our contributions is a 34 element crosswalk, a proposal that takes into consideration the constraints of the MARC21 standard as implemented in OCLC's World Cat and the realities of user behavior.
The need for data standards in zoomorphology.
Vogt, Lars; Nickel, Michael; Jenner, Ronald A; Deans, Andrew R
2013-07-01
eScience is a new approach to research that focuses on data mining and exploration rather than data generation or simulation. This new approach is arguably a driving force for scientific progress and requires data to be openly available, easily accessible via the Internet, and compatible with each other. eScience relies on modern standards for the reporting and documentation of data and metadata. Here, we suggest necessary components (i.e., content, concept, nomenclature, format) of such standards in the context of zoomorphology. We document the need for using data repositories to prevent data loss and how publication practice is currently changing, with the emergence of dynamic publications and the publication of digital datasets. Subsequently, we demonstrate that in zoomorphology the scientific record is still limited to published literature and that zoomorphological data are usually not accessible through data repositories. The underlying problem is that zoomorphology lacks the standards for data and metadata. As a consequence, zoomorphology cannot participate in eScience. We argue that the standardization of morphological data requires i) a standardized framework for terminologies for anatomy and ii) a formalized method of description that allows computer-parsable morphological data to be communicable, compatible, and comparable. The role of controlled vocabularies (e.g., ontologies) for developing respective terminologies and methods of description is discussed, especially in the context of data annotation and semantic enhancement of publications. Finally, we introduce the International Consortium for Zoomorphology Standards, a working group that is open to everyone and whose aim is to stimulate and synthesize dialog about standards. It is the Consortium's ultimate goal to assist the zoomorphology community in developing modern data and metadata standards, including anatomy ontologies, thereby facilitating the participation of zoomorphology in eScience. Copyright © 2013 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
West, Ruth G.; Margolis, Todd; Prudhomme, Andrew; Schulze, Jürgen P.; Mostafavi, Iman; Lewis, J. P.; Gossmann, Joachim; Singh, Rajvikram
2014-02-01
Scalable Metadata Environments (MDEs) are an artistic approach for designing immersive environments for large scale data exploration in which users interact with data by forming multiscale patterns that they alternatively disrupt and reform. Developed and prototyped as part of an art-science research collaboration, we define an MDE as a 4D virtual environment structured by quantitative and qualitative metadata describing multidimensional data collections. Entire data sets (e.g.10s of millions of records) can be visualized and sonified at multiple scales and at different levels of detail so they can be explored interactively in real-time within MDEs. They are designed to reflect similarities and differences in the underlying data or metadata such that patterns can be visually/aurally sorted in an exploratory fashion by an observer who is not familiar with the details of the mapping from data to visual, auditory or dynamic attributes. While many approaches for visual and auditory data mining exist, MDEs are distinct in that they utilize qualitative and quantitative data and metadata to construct multiple interrelated conceptual coordinate systems. These "regions" function as conceptual lattices for scalable auditory and visual representations within virtual environments computationally driven by multi-GPU CUDA-enabled fluid dyamics systems.
MMI: Increasing Community Collaboration
NASA Astrophysics Data System (ADS)
Galbraith, N. R.; Stocks, K.; Neiswender, C.; Maffei, A.; Bermudez, L.
2007-12-01
Building community requires a collaborative environment and guidance to help move members towards a common goal. An effective environment for community collaboration is a workspace that fosters participation and cooperation; effective guidance furthers common understanding and promotes best practices. The Marine Metadata Interoperability (MMI) project has developed a community web site to provide a collaborative environment for scientists, technologists, and data managers from around the world to learn about metadata and exchange ideas. Workshops, demonstration projects, and presentations also provide community-building opportunities for MMI. MMI has developed comprehensive online guides to help users understand and work with metadata standards, ontologies, and other controlled vocabularies. Documents such as "The Importance of Metadata Standards", "Usage vs. Discovery Vocabularies" and "Developing Controlled Vocabularies" guide scientists and data managers through a variety of metadata-related concepts. Members from eight organizations involved in marine science and informatics collaborated on this effort. The MMI web site has moved from Plone to Drupal, two content management systems which provide different opportunities for community-based work. Drupal's "organic groups" feature will be used to provide workspace for future teams tasked with content development, outreach, and other MMI mission-critical work. The new site is designed to enable members to easily create working areas, to build communities dedicated to developing consensus on metadata and other interoperability issues. Controlled-vocabulary-driven menus, integrated mailing-lists, member-based content creation and review tools are facets of the new web site architecture. This move provided the challenge of developing a hierarchical vocabulary to describe the resources presented on the site; consistent and logical tagging of web pages is the basis of Drupal site navigation. The new MMI web site presents enhanced opportunities for electronic discussions, focused collaborative work, and even greater community participation. The MMI project is beginning a new initiative to comprehensively catalog and document tools for marine metadata. The new MMI community-based web site will be used to support this work and to support the work of other ad-hoc teams in the future. We are seeking broad input from the community on this effort.
The health care and life sciences community profile for dataset descriptions
Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko
2016-01-01
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295
An Overview of Tools for Creating, Validating and Using PDS Metadata
NASA Astrophysics Data System (ADS)
King, T. A.; Hardman, S. H.; Padams, J.; Mafi, J. N.; Cecconi, B.
2017-12-01
NASA's Planetary Data System (PDS) has defined information models for creating metadata to describe bundles, collections and products for all the assets acquired by a planetary science projects. Version 3 of the PDS Information Model (commonly known as "PDS3") is widely used and is used to describe most of the existing planetary archive. Recently PDS has released version 4 of the Information Model (commonly known as "PDS4") which is designed to improve consistency, efficiency and discoverability of information. To aid in creating, validating and using PDS4 metadata the PDS and a few associated groups have developed a variety of tools. In addition, some commercial tools, both free and for a fee, can be used to create and work with PDS4 metadata. We present an overview of these tools, describe those tools currently under development and provide guidance as to which tools may be most useful for missions, instrument teams and the individual researcher.
Data Archival and Retrieval Enhancement (DARE) Metadata Modeling and Its User Interface
NASA Technical Reports Server (NTRS)
Hyon, Jason J.; Borgen, Rosana B.
1996-01-01
The Defense Nuclear Agency (DNA) has acquired terabytes of valuable data which need to be archived and effectively distributed to the entire nuclear weapons effects community and others...This paper describes the DARE (Data Archival and Retrieval Enhancement) metadata model and explains how it is used as a source for generating HyperText Markup Language (HTML)or Standard Generalized Markup Language (SGML) documents for access through web browsers such as Netscape.
Enriching public descriptions of marine phages using the Genomic Standards Consortium MIGS standard
Duhaime, Melissa Beth; Kottmann, Renzo; Field, Dawn; Glöckner, Frank Oliver
2011-01-01
In any sequencing project, the possible depth of comparative analysis is determined largely by the amount and quality of the accompanying contextual data. The structure, content, and storage of this contextual data should be standardized to ensure consistent coverage of all sequenced entities and facilitate comparisons. The Genomic Standards Consortium (GSC) has developed the “Minimum Information about Genome/Metagenome Sequences (MIGS/MIMS)” checklist for the description of genomes and here we annotate all 30 publicly available marine bacteriophage sequences to the MIGS standard. These annotations build on existing International Nucleotide Sequence Database Collaboration (INSDC) records, and confirm, as expected that current submissions lack most MIGS fields. MIGS fields were manually curated from the literature and placed in XML format as specified by the Genomic Contextual Data Markup Language (GCDML). These “machine-readable” reports were then analyzed to highlight patterns describing this collection of genomes. Completed reports are provided in GCDML. This work represents one step towards the annotation of our complete collection of genome sequences and shows the utility of capturing richer metadata along with raw sequences. PMID:21677864
QuakeML: XML for Seismological Data Exchange and Resource Metadata Description
NASA Astrophysics Data System (ADS)
Euchner, F.; Schorlemmer, D.; Becker, J.; Heinloo, A.; Kästli, P.; Saul, J.; Weber, B.; QuakeML Working Group
2007-12-01
QuakeML is an XML-based data exchange format for seismology that is under development. Current collaborators are from ETH, GFZ, USC, USGS, IRIS DMC, EMSC, ORFEUS, and ISTI. QuakeML development was motivated by the lack of a widely accepted and well-documented data format that is applicable to a broad range of fields in seismology. The development team brings together expertise from communities dealing with analysis and creation of earthquake catalogs, distribution of seismic bulletins, and real-time processing of seismic data. Efforts to merge QuakeML with existing XML dialects are under way. The first release of QuakeML will cover a basic description of seismic events including picks, arrivals, amplitudes, magnitudes, origins, focal mechanisms, and moment tensors. Further extensions are in progress or planned, e.g., for macroseismic information, location probability density functions, slip distributions, and ground motion information. The QuakeML language definition is supplemented by a concept to provide resource metadata and facilitate metadata exchange between distributed data providers. For that purpose, we introduce unique, location-independent identifiers of seismological resources. As an application of QuakeML, ETH Zurich currently develops a Python-based seismicity analysis toolkit as a contribution to CSEP (Collaboratory for the Study of Earthquake Predictability). We follow a collaborative and transparent development approach along the lines of the procedures of the World Wide Web Consortium (W3C). QuakeML currently is in working draft status. The standard description will be subjected to a public Request for Comments (RFC) process and eventually reach the status of a recommendation. QuakeML can be found at http://www.quakeml.org.
NASA's Earth Observing System Data and Information System - Many Mechanisms for On-Going Evolution
NASA Astrophysics Data System (ADS)
Ramapriyan, H. K.
2012-12-01
NASA's Earth Observing System Data and Information System has been serving a broad user community since August 1994. As a long-lived multi-mission system serving multiple scientific disciplines and a diverse user community, EOSDIS has been evolving continuously. It has had and continues to have many forms of community input to help with this evolution. Early in its history, it had inputs from the EOSDIS Advisory Panel, benefited from the reviews by various external committees and evolved into the present distributed architecture with discipline-based Distributed Active Archive Centers (DAACs), Science Investigator-led Processing Systems and a cross-DAAC search and data access capability. EOSDIS evolution has been helped by advances in computer technology, moving from an initially planned supercomputing environment to SGI workstations to Linux Clusters for computation and from near-line archives of robotic silos with tape cassettes to RAID-disk-based on-line archives for storage. The network capacities have increased steadily over the years making delivery of data on media almost obsolete. The advances in information systems technologies have been having an even greater impact on the evolution of EOSDIS. In the early days, the advent of the World Wide Web came as a game-changer in the operation of EOSDIS. The metadata model developed for the EOSDIS Core System for representing metadata from EOS standard data products has had an influence on the Federal Geographic Data Committee's metadata content standard and the ISO metadata standards. The influence works both ways. As ISO 19115 metadata standard has developed in recent years, EOSDIS is reviewing its metadata to ensure compliance with the standard. Improvements have been made in the cross-DAAC search and access of data using the centralized metadata clearing house (EOS Clearing House - ECHO) and the client Reverb. Given the diversity of the Earth science disciplines served by the DAACs, the DAACs have developed a number of software tools tailored to their respective user communities. Web services play an important part in improved access to data products including some basic analysis and visualization capabilities. A coherent view into all capabilities available from EOSDIS is evolving through the "Coherent Web" effort. Data are being made available in near real-time for scientific research as well as time-critical applications. On-going community inputs for infusion for maintaining vitality of EOSDIS come from technology developments by NASA-sponsored community data system programs - Advancing Collaborative Connections for Earth System Science (ACCESS), Making Earth System Data Records for Use in Research Environments (MEaSUREs) and Applied Information System Technology (AIST), as well as participation in Earth Science Data System Working Groups, the Earth Science Information Partners Federation and other interagency/international activities. An important source of community needs is the annual American Customer Satisfaction Index survey of EOSDIS users. Some of the key areas in which improvements are required and incremental progress is being made are: ease of discovery and access; cross-organizational interoperability; data inter-use; ease of collaboration; ease of citation of datasets; preservation of provenance and context and making them conveniently available to users.
Spatial Data Transfer Standard (SDTS), part 5 : SDTS raster profile and extensions
DOT National Transportation Integrated Search
1999-02-01
The Spatial Data Transfer Standard (SDTS) defines a general mechanism for the transfer of : geographically referenced spatial data and its supporting metadata, i.e., attributes, data quality reports, : coordinate reference systems, security informati...
Virtual patient repositories--a comparative analysis.
Küfner, Julia; Kononowicz, Andrzej A; Hege, Inga
2014-01-01
Virtual Patients (VPs) are an important component of medical education. One way to reduce the costs for creating VPs is sharing through repositories. We conducted a literature review to identify existing repositories and analyzed the 17 included repositories in regards to the search functions and metadata they provide. Most repositories provided some metadata such as title or description, whereas other data, such as educational objectives, were less frequent. Future research could, in cooperation with the repository provider, investigate user expectations and usage patterns.
An Open Metadata Schema for Clinical Pathway (openCP) in China.
Xu, Wei; Zhu, Yanxin; Wang, Xia
2017-01-01
China has issued and implemented standard clinical pathways (Chinese standard CPs) since 2009; however, they are still paper-based CPs. The aim of the study is to reorganize Chinese standard CPs based on related Chinese medical standards, by using archetype approach, and develop an Open platform for CP (openCP) in China.
MEMOPS: data modelling and automatic code generation.
Fogh, Rasmus H; Boucher, Wayne; Ionides, John M C; Vranken, Wim F; Stevens, Tim J; Laue, Ernest D
2010-03-25
In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces--APIs--currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.
The Application of the SPASE Metadata Standard in the U.S. and Worldwide
NASA Astrophysics Data System (ADS)
Thieman, J. R.; King, T. A.; Roberts, D.
2012-12-01
The Space Physics Archive Search and Extract (SPASE) Metadata standard for Heliophysics and related data is now an established standard within the NASA-funded space and solar physics community and is spreading to the international groups within that community. Development of SPASE had involved a number of international partners and the current version of the SPASE Metadata Model (version 2.2.2) has not needed any structural modifications since January 2011 . The SPASE standard has been adopted by groups such as NASA's Heliophysics division, the Canadian Space Science Data Portal (CSSDP), Canada's AUTUMN network, Japan's Inter-university Upper atmosphere Global Observation NETwork (IUGONET), Centre de Données de la Physique des Plasmas (CDPP), and the near-Earth space data infrastructure for e-Science (ESPAS). In addition, portions of the SPASE dictionary have been modeled in semantic web ontologies for use with reasoners and semantic searches. While we anticipate additional modifications to the model in the future to accommodate simulation and model data, these changes will not affect the data descriptions already generated for instrument-related datasets. Examples of SPASE descriptions can be viewed at
OlyMPUS - The Ontology-based Metadata Portal for Unified Semantics
NASA Astrophysics Data System (ADS)
Huffer, E.; Gleason, J. L.
2015-12-01
The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS leverages the semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated interface for producing the semantically rich metadata needed to support ODISEES' data discovery and access services. It integrates the ODISEES metadata search system with multiple NASA data delivery tools to enable data consumers to create customized data sets for download to their computers, or for NASA Advanced Supercomputing (NAS) facility registered users, directly to NAS storage resources for access by applications running on NAS supercomputers. A core function of NASA's Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform complex analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many Earth science data products are disparate and hard to synthesize. Variations in how data are collected, processed, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. Robust, semantically rich metadata can support tools for data discovery and facilitate machine-to-machine transactions with services such as data subsetting, regridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA's strategic plans. However, as metadata requirements increase and competing standards emerge, metadata provisioning becomes increasingly burdensome to data producers. The OlyMPUS system helps data providers produce semantically rich metadata, making their data more accessible to data consumers, and helps data consumers quickly discover and download the right data for their research.
Trippi, Michael H.; Kinney, Scott A.; Gunther, Gregory; Ryder, Robert T.; Ruppert, Leslie F.; Ruppert, Leslie F.; Ryder, Robert T.
2014-01-01
Metadata for these datasets are available in HTML and XML formats. Metadata files contain information about the sources of data used to create the dataset, the creation process steps, the data quality, the geographic coordinate system and horizontal datum used for the dataset, the values of attributes used in the dataset table, information about the publication and the publishing organization, and other information that may be useful to the reader. All links in the metadata were valid at the time of compilation. Some of these links may no longer be valid. No attempt has been made to determine the new online location (if one exists) for the data.
NASA Astrophysics Data System (ADS)
Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.
2015-12-01
The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.
National Pipeline Mapping System (NPMS) : repository standards
DOT National Transportation Integrated Search
1997-07-01
This draft document contains 7 sections. They are as follows: 1. General Topics, 2. Data Formats, 3. Metadata, 4. Attribute Data, 5. Data Flow, 6. Descriptive Process, and 7. Validation and Processing of Submitted Data. These standards were created w...
Mercury- Distributed Metadata Management, Data Discovery and Access System
NASA Astrophysics Data System (ADS)
Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.
2007-12-01
Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Hybrid Multiagent System for Automatic Object Learning Classification
NASA Astrophysics Data System (ADS)
Gil, Ana; de La Prieta, Fernando; López, Vivian F.
The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives.
Zhou, Li; Hongsermeier, Tonya; Boxwala, Aziz; Lewis, Janet; Kawamoto, Kensaku; Maviglia, Saverio; Gentile, Douglas; Teich, Jonathan M; Rocha, Roberto; Bell, Douglas; Middleton, Blackford
2013-01-01
At present, there are no widely accepted, standard approaches for representing computer-based clinical decision support (CDS) intervention types and their structural components. This study aimed to identify key requirements for the representation of five widely utilized CDS intervention types: alerts and reminders, order sets, infobuttons, documentation templates/forms, and relevant data presentation. An XML schema was proposed for representing these interventions and their core structural elements (e.g., general metadata, applicable clinical scenarios, CDS inputs, CDS outputs, and CDS logic) in a shareable manner. The schema was validated by building CDS artifacts for 22 different interventions, targeted toward guidelines and clinical conditions called for in the 2011 Meaningful Use criteria. Custom style sheets were developed to render the XML files in human-readable form. The CDS knowledge artifacts were shared via a public web portal. Our experience also identifies gaps in existing standards and informs future development of standards for CDS knowledge representation and sharing.
He, Yongqun; Xiang, Zuoshuang; Zheng, Jie; Lin, Yu; Overton, James A; Ong, Edison
2018-01-12
Ontologies are critical to data/metadata and knowledge standardization, sharing, and analysis. With hundreds of biological and biomedical ontologies developed, it has become critical to ensure ontology interoperability and the usage of interoperable ontologies for standardized data representation and integration. The suite of web-based Ontoanimal tools (e.g., Ontofox, Ontorat, and Ontobee) support different aspects of extensible ontology development. By summarizing the common features of Ontoanimal and other similar tools, we identified and proposed an "eXtensible Ontology Development" (XOD) strategy and its associated four principles. These XOD principles reuse existing terms and semantic relations from reliable ontologies, develop and apply well-established ontology design patterns (ODPs), and involve community efforts to support new ontology development, promoting standardized and interoperable data and knowledge representation and integration. The adoption of the XOD strategy, together with robust XOD tool development, will greatly support ontology interoperability and robust ontology applications to support data to be Findable, Accessible, Interoperable and Reusable (i.e., FAIR).
A New Data Management System for Biological and Chemical Oceanography
NASA Astrophysics Data System (ADS)
Groman, R. C.; Chandler, C.; Allison, D.; Glover, D. M.; Wiebe, P. H.
2007-12-01
The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was created to serve PIs principally funded by NSF to conduct marine chemical and ecological research. The new office is dedicated to providing open access to data and information developed in the course of scientific research on short and intermediate time-frames. The data management system developed in support of U.S. JGOFS and U.S. GLOBEC programs is being modified to support the larger scope of the BCO-DMO effort, which includes ultimately providing a way to exchange data with other data systems. The open access system is based on a philosophy of data stewardship, support for existing and evolving data standards, and use of public domain software. The DMO staff work closely with originating PIs to manage data gathered as part of their individual programs. In the new BCO-DMO data system, project and data set metadata records designed to support re-use of the data are stored in a relational database (MySQL) and the data are stored in or made accessible by the JGOFS/GLOBEC object- oriented, relational, data management system. Data access will be provided via any standard Web browser client user interface through a GIS application (Open Source, OGC-compliant MapServer), a directory listing from the data holdings catalog, or a custom search engine that facilitates data discovery. In an effort to maximize data system interoperability, data will also be available via Web Services; and data set descriptions will be generated to comply with a variety of metadata content standards. The office is located at the Woods Hole Oceanographic Institution and web access is via http://www.bco-dmo.org.
The Planetary Data System Information Model for Geometry Metadata
NASA Astrophysics Data System (ADS)
Guinness, E. A.; Gordon, M. K.
2014-12-01
The NASA Planetary Data System (PDS) has recently developed a new set of archiving standards based on a rigorously defined information model. An important part of the new PDS information model is the model for geometry metadata, which includes, for example, attributes of the lighting and viewing angles of observations, position and velocity vectors of a spacecraft relative to Sun and observing body at the time of observation and the location and orientation of an observation on the target. The PDS geometry model is based on requirements gathered from the planetary research community, data producers, and software engineers who build search tools. A key requirement for the model is that it fully supports the breadth of PDS archives that include a wide range of data types from missions and instruments observing many types of solar system bodies such as planets, ring systems, and smaller bodies (moons, comets, and asteroids). Thus, important design aspects of the geometry model are that it standardizes the definition of the geometry attributes and provides consistency of geometry metadata across planetary science disciplines. The model specification also includes parameters so that the context of values can be unambiguously interpreted. For example, the reference frame used for specifying geographic locations on a planetary body is explicitly included with the other geometry metadata parameters. The structure and content of the new PDS geometry model is designed to enable both science analysis and efficient development of search tools. The geometry model is implemented in XML, as is the main PDS information model, and uses XML schema for validation. The initial version of the geometry model is focused on geometry for remote sensing observations conducted by flyby and orbiting spacecraft. Future releases of the PDS geometry model will be expanded to include metadata for landed and rover spacecraft.
The Modeling and Simulation Catalog for Discovery, Knowledge and Reuse
NASA Technical Reports Server (NTRS)
Stone, George F. III; Greenberg, Brandi; Daehler-Wilking, Richard; Hunt, Steven
2011-01-01
The DoD M&S Steering Committee has noted that the current DoD and Service's modeling and simulation resource repository (MSRR) services are not up-to-date limiting their value to the using communities. However, M&S leaders and managers also determined that the Department needs a functional M&S registry card catalog to facilitate M&S tool and data visibility to support M&S activities across the DoD. The M&S Catalog will discover and access M&S metadata maintained at nodes distributed across DoD networks in a centrally managed, decentralized process that employs metadata collection and management. The intent is to link information stores, precluding redundant location updating. The M&S Catalog uses a standard metadata schemas based on the DoD's Net-Centric Data Strategy Community of Interest metadata specification. The Air Force, Navy and OSD (CAPE) have provided initial information to participating DoD nodes, but plans on the horizon are being made to bring in hundreds of source providers.
Automated software system for checking the structure and format of ACM SIG documents
NASA Astrophysics Data System (ADS)
Mirza, Arsalan Rahman; Sah, Melike
2017-04-01
Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
Towards a semantic PACS: Using Semantic Web technology to represent imaging data.
Van Soest, Johan; Lustberg, Tim; Grittner, Detlef; Marshall, M Scott; Persoon, Lucas; Nijsten, Bas; Feltens, Peter; Dekker, Andre
2014-01-01
The DICOM standard is ubiquitous within medicine. However, improved DICOM semantics would significantly enhance search operations. Furthermore, databases of current PACS systems are not flexible enough for the demands within image analysis research. In this paper, we investigated if we can use Semantic Web technology, to store and represent metadata of DICOM image files, as well as linking additional computational results to image metadata. Therefore, we developed a proof of concept containing two applications: one to store commonly used DICOM metadata in an RDF repository, and one to calculate imaging biomarkers based on DICOM images, and store the biomarker values in an RDF repository. This enabled us to search for all patients with a gross tumor volume calculated to be larger than 50 cc. We have shown that we can successfully store the DICOM metadata in an RDF repository and are refining our proof of concept with regards to volume naming, value representation, and the applications themselves.
Li, Zuofeng; Wen, Jingran; Zhang, Xiaoyan; Wu, Chunxiao; Li, Zuogao; Liu, Lei
2012-01-01
Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress. PMID:23304327
A Flexible Online Metadata Editing and Management System
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aguilar, Raul; Pan, Jerry Yun; Gries, Corinna
2010-01-01
A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fieldsmore » user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to metadata files in the application introduced here is managed in a custom online module, using a MySQL backend accessed by a simple Java Server Faces front end. A flexible system with three grouping options, organization, group and single editing access is provided. Three levels were chosen to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.« less
Pathogen metadata platform: software for accessing and analyzing pathogen strain information.
Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia
2016-09-15
Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
Mapping the function of neuronal ion channels in model and experiment
Podlaski, William F; Seeholzer, Alexander; Groschner, Lukas N; Miesenböck, Gero; Ranjan, Rajnish; Vogels, Tim P
2017-01-01
Ion channel models are the building blocks of computational neuron models. Their biological fidelity is therefore crucial for the interpretation of simulations. However, the number of published models, and the lack of standardization, make the comparison of ion channel models with one another and with experimental data difficult. Here, we present a framework for the automated large-scale classification of ion channel models. Using annotated metadata and responses to a set of voltage-clamp protocols, we assigned 2378 models of voltage- and calcium-gated ion channels coded in NEURON to 211 clusters. The IonChannelGenealogy (ICGenealogy) web interface provides an interactive resource for the categorization of new and existing models and experimental recordings. It enables quantitative comparisons of simulated and/or measured ion channel kinetics, and facilitates field-wide standardization of experimentally-constrained modeling. DOI: http://dx.doi.org/10.7554/eLife.22152.001 PMID:28267430
NASA Astrophysics Data System (ADS)
Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.
2011-12-01
Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.
NASA Astrophysics Data System (ADS)
Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.
2013-12-01
Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.
Seeking the Path to Metadata Nirvana
NASA Astrophysics Data System (ADS)
Graybeal, J.
2008-12-01
Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and time delays from the data sharing process, computers emphasize the contextual information that must be exchanged in order to exchange and reuse data. This requirement for contextual information has two faces: "interoperability" when talking about systems, and "the metadata problem" when talking about data. As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas, as you might imagine, there will always be complexities and incompatibilities that are not addressed, and data systems that are not interoperable, even within a science discipline. So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in the other direction. Computer systems let us work with more complexity, build community knowledge and collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations, science communities, and technologists see the importance interoperable systems and metadata, and direct resources toward them. With the new approaches and resources, projects like IPY and MMI can simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems. This presentation will outline the role metadata plays in durable interoperable data systems, for better or worse. It will describe times when "just choosing a standard" can work, and when it probably won't work. And it will point out signs that suggest a metadata storm is coming to your community project, and how you might avoid it. From these lessons we will seek a path to producing interoperable, interdisciplinary, metadata-enlightened environment observing systems.
An open repositories network development for medical teaching resources.
Soula, Gérard; Darmoni, Stefan; Le Beux, Pierre; Renard, Jean-Marie; Dahamna, Badisse; Fieschi, Marius
2010-01-01
The lack of interoperability between repositories of heterogeneous and geographically widespread data is an obstacle to the diffusion, sharing and reutilization of those data. We present the development of an open repositories network taking into account both the syntactic and semantic interoperability of the different repositories and based on international standards in this field. The network is used by the medical community in France for the diffusion and sharing of digital teaching resources. The syntactic interoperability of the repositories is managed using the OAI-PMH protocol for the exchange of metadata describing the resources. Semantic interoperability is based, on one hand, on the LOM standard for the description of resources and on MESH for the indexing of the latter and, on the other hand, on semantic interoperability management designed to optimize compliance with standards and the quality of the metadata.
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E.; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data. PMID:21045053
Sun, Shulei; Chen, Jing; Li, Weizhong; Altintas, Ilkay; Lin, Abel; Peltier, Steve; Stocks, Karen; Allen, Eric E; Ellisman, Mark; Grethe, Jeffrey; Wooley, John
2011-01-01
The Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA, http://camera.calit2.net/) is a database and associated computational infrastructure that provides a single system for depositing, locating, analyzing, visualizing and sharing data about microbial biology through an advanced web-based analysis portal. CAMERA collects and links metadata relevant to environmental metagenome data sets with annotation in a semantically-aware environment allowing users to write expressive semantic queries against the database. To meet the needs of the research community, users are able to query metadata categories such as habitat, sample type, time, location and other environmental physicochemical parameters. CAMERA is compliant with the standards promulgated by the Genomic Standards Consortium (GSC), and sustains a role within the GSC in extending standards for content and format of the metagenomic data and metadata and its submission to the CAMERA repository. To ensure wide, ready access to data and annotation, CAMERA also provides data submission tools to allow researchers to share and forward data to other metagenomics sites and community data archives such as GenBank. It has multiple interfaces for easy submission of large or complex data sets, and supports pre-registration of samples for sequencing. CAMERA integrates a growing list of tools and viewers for querying, analyzing, annotating and comparing metagenome and genome data.
Java-Library for the Access, Storage and Editing of Calibration Metadata of Optical Sensors
NASA Astrophysics Data System (ADS)
Firlej, M.; Kresse, W.
2016-06-01
The standardization of the calibration of optical sensors in photogrammetry and remote sensing has been discussed for more than a decade. Projects of the German DGPF and the European EuroSDR led to the abstract International Technical Specification ISO/TS 19159-1:2014 "Calibration and validation of remote sensing imagery sensors and data - Part 1: Optical sensors". This article presents the first software interface for a read- and write-access to all metadata elements standardized in the ISO/TS 19159-1. This interface is based on an xml-schema that was automatically derived by ShapeChange from the UML-model of the Specification. The software interface serves two cases. First, the more than 300 standardized metadata elements are stored individually according to the xml-schema. Secondly, the camera manufacturers are using many administrative data that are not a part of the ISO/TS 19159-1. The new software interface provides a mechanism for input, storage, editing, and output of both types of data. Finally, an output channel towards a usual calibration protocol is provided. The interface is written in Java. The article also addresses observations made when analysing the ISO/TS 19159-1 and compiles a list of proposals for maturing the document, i.e. for an updated version of the Specification.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri
The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less
NASA Astrophysics Data System (ADS)
Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.
2013-12-01
There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.
NASA Technical Reports Server (NTRS)
Podio, Fernando; Vollrath, William; Williams, Joel; Kobler, Ben; Crouse, Don
1998-01-01
Sophisticated network storage management applications are rapidly evolving to satisfy a market demand for highly reliable data storage systems with large data storage capacities and performance requirements. To preserve a high degree of data integrity, these applications must rely on intelligent data storage devices that can provide reliable indicators of data degradation. Error correction activity generally occurs within storage devices without notification to the host. Early indicators of degradation and media error monitoring 333 and reporting (MEMR) techniques implemented in data storage devices allow network storage management applications to notify system administrators of these events and to take appropriate corrective actions before catastrophic errors occur. Although MEMR techniques have been implemented in data storage devices for many years, until 1996 no MEMR standards existed. In 1996 the American National Standards Institute (ANSI) approved the only known (world-wide) industry standard specifying MEMR techniques to verify stored data on optical disks. This industry standard was developed under the auspices of the Association for Information and Image Management (AIIM). A recently formed AIIM Optical Tape Subcommittee initiated the development of another data integrity standard specifying a set of media error monitoring tools and media error monitoring information (MEMRI) to verify stored data on optical tape media. This paper discusses the need for intelligent storage devices that can provide data integrity metadata, the content of the existing data integrity standard for optical disks, and the content of the MEMRI standard being developed by the AIIM Optical Tape Subcommittee.
NASA Astrophysics Data System (ADS)
Car, Nicholas; Cox, Simon; Fitch, Peter
2015-04-01
With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty (PROV-O Entity and Activity classes) have UncertML elements recorded. This methodology is intentionally flexible to allow uncertainty metadata in many forms, not limited to UncertML. While the more formal representation of uncertainty metadata is desirable (using UncertProv elements to implement the UncertML conceptual model ), this will not always be possible, and any uncertainty data stored will be better than none. Since the UncertProv ontology contains a superset of UncertML elements to facilitate the representation of non-UncertML uncertainty data, it could easily be extended to include other formal uncertainty conceptual models thus allowing non-UncertML propagation calculations.
NASA Astrophysics Data System (ADS)
Longo, S.; Nativi, S.; Leone, C.; Migliorini, S.; Mazari Villanova, L.
2012-04-01
Italian Polar Metadata System C.Leone, S.Longo, S.Migliorini, L.Mazari Villanova, S. Nativi The Italian Antarctic Research Programme (PNRA) is a government initiative funding and coordinating scientific research activities in polar regions. PNRA manages two scientific Stations in Antarctica - Concordia (Dome C), jointly operated with the French Polar Institute "Paul Emile Victor", and Mario Zucchelli (Terra Nova Bay, Southern Victoria Land). In addition National Research Council of Italy (CNR) manages one scientific Station in the Arctic Circle (Ny-Alesund-Svalbard Islands), named Dirigibile Italia. PNRA started in 1985 with the first Italian Expedition in Antarctica. Since then each research group has collected data regarding biology and medicine, geodetic observatory, geophysics, geology, glaciology, physics and atmospheric chemistry, earth-sun relationships and astrophysics, oceanography and marine environment, chemistry contamination, law and geographic science, technology, multi and inter disciplinary researches, autonomously with different formats. In 2010 the Italian Ministry of Research assigned the scientific coordination of the Programme to CNR, which is in charge of the management and sharing of the scientific results carried out in the framework of the PNRA. Therefore, CNR is establishing a new distributed cyber(e)-infrastructure to collect, manage, publish and share polar research results. This is a service-based infrastructure building on Web technologies to implement resources (i.e. data, services and documents) discovery, access and visualization; in addition, semantic-enabled functionalities will be provided. The architecture applies the "System of Systems" principles to build incrementally on the existing systems by supplementing but not supplanting their mandates and governance arrangements. This allows to keep the existing capacities as autonomous as possible. This cyber(e)-infrastructure implements multi-disciplinary interoperability following a Brokering approach and supporting the relevant international standards recognized by European and international standards, including: GEO/GEOSS, INSPIRE and SCAR. The Brokering approach is empowered by a technology developed by CNR, advanced by the FP7 EuroGEOSS project, and recently adopted by the GEOSS Common Infrastructure (GCI).
Collaborative Sharing of Multidimensional Space-time Data Using HydroShare
NASA Astrophysics Data System (ADS)
Gan, T.; Tarboton, D. G.; Horsburgh, J. S.; Dash, P. K.; Idaszak, R.; Yi, H.; Blanton, B.
2015-12-01
HydroShare is a collaborative environment being developed for sharing hydrological data and models. It includes capability to upload data in many formats as resources that can be shared. The HydroShare data model for resources uses a specific format for the representation of each type of data and specifies metadata common to all resource types as well as metadata unique to specific resource types. The Network Common Data Form (NetCDF) was chosen as the format for multidimensional space-time data in HydroShare. NetCDF is widely used in hydrological and other geoscience modeling because it contains self-describing metadata and supports the creation of array-oriented datasets that may include three spatial dimensions, a time dimension and other user defined dimensions. For example, NetCDF may be used to represent precipitation or surface air temperature fields that have two dimensions in space and one dimension in time. This presentation will illustrate how NetCDF files are used in HydroShare. When a NetCDF file is loaded into HydroShare, header information is extracted using the "ncdump" utility. Python functions developed for the Django web framework on which HydroShare is based, extract science metadata present in the NetCDF file, saving the user from having to enter it. Where the file follows Climate Forecast (CF) convention and Attribute Convention for Dataset Discovery (ACDD) standards, metadata is thus automatically populated. Users also have the ability to add metadata to the resource that may not have been present in the original NetCDF file. HydroShare's metadata editing functionality then writes this science metadata back into the NetCDF file to maintain consistency between the science metadata in HydroShare and the metadata in the NetCDF file. This further helps researchers easily add metadata information following the CF and ACDD conventions. Additional data inspection and subsetting functions were developed, taking advantage of Python and command line libraries for working with NetCDF files. We describe the design and implementation of these features and illustrate how NetCDF files from a modeling application may be curated in HydroShare and thus enhance reproducibility of the associated research. We also discuss future development planned for multidimensional space-time data in HydroShare.
NASA Astrophysics Data System (ADS)
McGibbney, L. J.; Hausman, J.; Laurencelle, J. C.; Toaz, R., Jr.; McAuley, J.; Freeborn, D. J.; Stoner, C.
2016-12-01
The Surface Water & Ocean Topography (SWOT) mission brings together two communities focused on a better understanding of the world's oceans and its terrestrial surface waters. U.S. and French oceanographers and hydrologists and international partners have joined forces to develop this new space mission. At NASA JPL's PO.DAAC, the team is currently engaged in the gathering of SWOT User Stores (access patterns, metadata requirements, primary and value added product requirements, data access protocols, etc.) to better inform the adaptive planning of what will be known as the next generation PO.DAAC Information Architecture (IA). The IA effort acknowledges that missions such as SWOT (and NISAR) have few or no precedent in terms of data volume, hot and cold storage, archival, analysis, existing system engineering complexities, etc. and that the only way we can better understand the projected impacts of such requirements is to interface directly with the User Community. Additionally, it also acknowledges that collective learning has taken place to understand certain limitations in the existing data models (DM) underlying the existing PO.DAAC Data Management and Archival System. This work documents an evolutionary, use case based, standards driven approach to adapting the legacy DM and accompanying knowledge representation infrastructure at NASA JPL's PO.DAAC to address forthcoming DAAC mission requirements presented by missions such as SWOT. Some of the topics covered in this evolution include, but are not limited to: How we are leveraging lessons learned from the development of existing DM (such as that generated for SMAP) in an attempt to map them to SWOT. What is the governance model for the SWOT IA? What are the `governing' entities? What is the hierarchy of the `governed entities'? How are elements grouped? How is the design-working group formed? How is model independence maintained and what choices/requirements do we have for the implementation language? The use of Standards such as CF Conventions, NetCDF, HDF and ISO Metadata, etc. Beyond SWOT… what choices were made such that the new PO.DAAC IA will flexible enough and adequately design such that future missions with even more advanced requirements can be accommodated within PO.DAAC.
NASA Astrophysics Data System (ADS)
Devaraju, Anusuriya; Klump, Jens; Tey, Victor; Fraser, Ryan
2016-04-01
Physical samples such as minerals, soil, rocks, water, air and plants are important observational units for understanding the complexity of our environment and its resources. They are usually collected and curated by different entities, e.g., individual researchers, laboratories, state agencies, or museums. Persistent identifiers may facilitate access to physical samples that are scattered across various repositories. They are essential to locate samples unambiguously and to share their associated metadata and data systematically across the Web. The International Geo Sample Number (IGSN) is a persistent, globally unique label for identifying physical samples. The IGSNs of physical samples are registered by end-users (e.g., individual researchers, data centers and projects) through allocating agents. Allocating agents are the institutions acting on behalf of the implementing organization (IGSN e.V.). The Commonwealth Scientific and Industrial Research Organisation CSIRO) is one of the allocating agents in Australia. To implement IGSN in our organisation, we developed a RESTful service and a metadata model. The web service enables a client to register sub-namespaces and multiple samples, and retrieve samples' metadata programmatically. The metadata model provides a framework in which different types of samples may be represented. It is generic and extensible, therefore it may be applied in the context of multi-disciplinary projects. The metadata model has been implemented as an XML schema and a PostgreSQL database. The schema is used to handle sample registrations requests and to disseminate their metadata, whereas the relational database is used to preserve the metadata records. The metadata schema leverages existing controlled vocabularies to minimize the scope for error and incorporates some simplifications to reduce complexity of the schema implementation. The solutions developed have been applied and tested in the context of two sample repositories in CSIRO, the Capricorn Distal Footprints project and the Rock Store.
A semantically rich and standardised approach enhancing discovery of sensor data and metadata
NASA Astrophysics Data System (ADS)
Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise
2016-04-01
The marine environment plays an essential role in the earth's climate. To enhance the ability to monitor the health of this important system, innovative sensors are being produced and combined with state of the art sensor technology. As the number of sensors deployed is continually increasing,, it is a challenge for data users to find the data that meet their specific needs. Furthermore, users need to integrate diverse ocean datasets originating from the same or even different systems. Standards provide a solution to the above mentioned challenges. The Open Geospatial Consortium (OGC) has created Sensor Web Enablement (SWE) standards that enable different sensor networks to establish syntactic interoperability. When combined with widely accepted controlled vocabularies, they become semantically rich and semantic interoperability is achievable. In addition, Linked Data is the recommended best practice for exposing, sharing and connecting information on the Semantic Web using Uniform Resource Identifiers (URIs), Resource Description Framework (RDF) and RDF Query Language (SPARQL). As part of the EU-funded SenseOCEAN project, the British Oceanographic Data Centre (BODC) is working on the standardisation of sensor metadata enabling 'plug and play' sensor integration. Our approach combines standards, controlled vocabularies and persistent URIs to publish sensor descriptions, their data and associated metadata as 5 star Linked Data and OGC SWE (SensorML, Observations & Measurements) standard. Thus sensors become readily discoverable, accessible and useable via the web. Content and context based searching is also enabled since sensors descriptions are understood by machines. Additionally, sensor data can be combined with other sensor or Linked Data datasets to form knowledge. This presentation will describe the work done in BODC to achieve syntactic and semantic interoperability in the sensor domain. It will illustrate the reuse and extension of the Semantic Sensor Network (SSN) ontology to Linked Sensor Ontology (LSO) and the steps taken to combine OGC SWE with the Linked Data approach through alignment and embodiment of other ontologies. It will then explain how data and models were annotated with controlled vocabularies to establish unambiguous semantics and interconnect them with data from different sources. Finally, it will introduce the RDF triple store where the sensor descriptions and metadata are stored and can be queried through the standard query language SPARQL. Providing different flavours of machine readable interpretations of sensors, sensor data and metadata enhances discoverability but most importantly allows seamless aggregation of information from different networks that will finally produce knowledge.
Parekh, Ruchi; Armañanzas, Rubén; Ascoli, Giorgio A
2015-04-01
Digital reconstructions of axonal and dendritic arbors provide a powerful representation of neuronal morphology in formats amenable to quantitative analysis, computational modeling, and data mining. Reconstructed files, however, require adequate metadata to identify the appropriate animal species, developmental stage, brain region, and neuron type. Moreover, experimental details about tissue processing, neurite visualization and microscopic imaging are essential to assess the information content of digital morphologies. Typical morphological reconstructions only partially capture the underlying biological reality. Tracings are often limited to certain domains (e.g., dendrites and not axons), may be incomplete due to tissue sectioning, imperfect staining, and limited imaging resolution, or can disregard aspects irrelevant to their specific scientific focus (such as branch thickness or depth). Gauging these factors is critical in subsequent data reuse and comparison. NeuroMorpho.Org is a central repository of reconstructions from many laboratories and experimental conditions. Here, we introduce substantial additions to the existing metadata annotation aimed to describe the completeness of the reconstructed neurons in NeuroMorpho.Org. These expanded metadata form a suitable basis for effective description of neuromorphological data.
NASA Astrophysics Data System (ADS)
Liu, Chunlei; Ding, Wenrui; Li, Hongguang; Li, Jiankun
2017-09-01
Haze removal is a nontrivial work for medium-altitude unmanned aerial vehicle (UAV) image processing because of the effects of light absorption and scattering. The challenges are attributed mainly to image distortion and detail blur during the long-distance and large-scale imaging process. In our work, a metadata-assisted nonuniform atmospheric scattering model is proposed to deal with the aforementioned problems of medium-altitude UAV. First, to better describe the real atmosphere, we propose a nonuniform atmospheric scattering model according to the aerosol distribution, which directly benefits the image distortion correction. Second, considering the characteristics of long-distance imaging, we calculate the depth map, which is an essential clue to modeling, on the basis of UAV metadata information. An accurate depth map reduces the color distortion compared with the depth of field obtained by other existing methods based on priors or assumptions. Furthermore, we use an adaptive median filter to address the problem of fuzzy details caused by the global airlight value. Experimental results on both real flight and synthetic images demonstrate that our proposed method outperforms four other existing haze removal methods.
NASA Astrophysics Data System (ADS)
Servilla, M. S.; O'Brien, M.; Costa, D.
2013-12-01
Considerable ecological research performed today occurs through the analysis of data downloaded from various repositories and archives, often resulting in derived or synthetic products generated by automated workflows. These data are only meaningful for research if they are well documented by metadata, lest semantic or data type errors may occur in interpretation or processing. The Long Term Ecological Research (LTER) Network now screens all data packages entering its long-term archive to ensure that each package contains metadata that is complete, of high quality, and accurately describes the structure of its associated data entity and the data are structurally congruent to the metadata. Screening occurs prior to the upload of a data package into the Provenance Aware Synthesis Tracking Architecture (PASTA) data management system through a series of quality checks, thus preventing ambiguously or incorrectly documented data packages from entering the system. The quality checks within PASTA are designed to work specifically with the Ecological Metadata Language (EML), the metadata standard adopted by the LTER Network to describe data generated by their 26 research sites. Each quality check is codified in Java as part of the ecological community-supported Data Manager Library, which is a resource of the EML specification and used as a component of the PASTA software stack. Quality checks test for metadata quality, data integrity, or metadata-data congruence. Quality checks are further classified as either conditional or informational. Conditional checks issue a 'valid', 'warning' or 'error' response. Only an 'error' response blocks the data package from upload into PASTA. Informational checks only provide descriptive content pertaining to a particular facet of the data package. Quality checks are designed by a group of LTER information managers and reviewed by the LTER community before deploying into PASTA. A total of 32 quality checks have been deployed to date. Quality checks can be customized through a configurable template, which includes turning checks 'on' or 'off' and setting the severity of conditional checks. This feature is important to other potential users of the Data Manager Library who wish to configure its quality checks in accordance with the standards of their community. Executing the complete set of quality checks produces a report that describes the result of each check. The report is an XML document that is stored by PASTA for future reference.
Archiving InSight Lander Science Data Using PDS4 Standards
NASA Astrophysics Data System (ADS)
Stein, T.; Guinness, E. A.; Slavney, S.
2017-12-01
The InSight Mars Lander is scheduled for launch in 2018, and science data from the mission will be archived in the NASA Planetary Data System (PDS) using the new PDS4 standards. InSight is a geophysical lander with a science payload that includes a seismometer, a probe to measure subsurface temperatures and heat flow, a suite of meteorology instruments, a magnetometer, an experiment using radio tracking, and a robotic arm that will provide soil physical property information based on interactions with the surface. InSight is not the first science mission to archive its data using PDS4. However, PDS4 archives do not currently contain examples of the kinds of data that several of the InSight instruments will produce. Whereas the existing common PDS4 standards were sufficient for most of archiving requirements of InSight, the data generated by a few instruments required development of several extensions to the PDS4 information model. For example, the seismometer will deliver a version of its data in SEED format, which is standard for the terrestrial seismology community. This format required the design of a new product type in the PDS4 information model. A local data dictionary has also been developed for InSight that contains attributes that are not part of the common PDS4 dictionary. The local dictionary provides metadata relevant to all InSight data sets, and attributes specific to several of the instruments. Additional classes and attributes were designed for the existing PDS4 geometry dictionary that will capture metadata for the lander position and orientation, along with camera models for stereo image processing. Much of the InSight archive planning and design work has been done by a Data Archiving Working Group (DAWG), which has members from the InSight project and the PDS. The group coordinates archive design, schedules and peer review of the archive documentation and test products. The InSight DAWG archiving effort for PDS is being led by the PDS Geosciences Node with several other nodes working one-on-one with instruments relevant to their disciplines. Once the InSight mission begins operations, the DAWG will continue to provide oversight on release of InSight data to PDS. Lessons learned from InSight archive work will also feed forward to planning the archives for the Mars 2020 rover.
Cunningham, S G; Carinci, F; Brillante, M; Leese, G P; McAlpine, R R; Azzopardi, J; Beck, P; Bratina, N; Bocquet, V; Doggen, K; Jarosz-Chobot, P K; Jecht, M; Lindblad, U; Moulton, T; Metelko, Ž; Nagy, A; Olympios, G; Pruna, S; Skeie, S; Storms, F; Di Iorio, C T; Massi Benedetti, M
2016-01-01
A set of core diabetes indicators were identified in a clinical review of current evidence for the EUBIROD project. In order to allow accurate comparisons of diabetes indicators, a standardised currency for data storage and aggregation was required. We aimed to define a robust European data dictionary with appropriate clinical definitions that can be used to analyse diabetes outcomes and provide the foundation for data collection from existing electronic health records for diabetes. Existing clinical datasets used by 15 partner institutions across Europe were collated and common data items analysed for consistency in terms of recording, data definition and units of measurement. Where necessary, data mappings and algorithms were specified in order to allow partners to meet the standard definitions. A series of descriptive elements were created to document metadata for each data item, including recording, consistency, completeness and quality. While datasets varied in terms of consistency, it was possible to create a common standard that could be used by all. The minimum dataset defined 53 data items that were classified according to their feasibility and validity. Mappings and standardised definitions were used to create an electronic directory for diabetes care, providing the foundation for the EUBIROD data analysis repository, also used to implement the diabetes registry and model of care for Cyprus. The development of data dictionaries and standards can be used to improve the quality and comparability of health information. A data dictionary has been developed to be compatible with other existing data sources for diabetes, within and beyond Europe.
NASA Astrophysics Data System (ADS)
Stein, Olaf; Schultz, Martin G.; Rambadt, Michael; Saini, Rajveer; Hoffmann, Lars; Mallmann, Daniel
2017-04-01
Global model data of atmospheric composition produced by the Copernicus Atmospheric Monitoring Service (CAMS) is collected since 2010 at FZ Jülich and serves as boundary condition for use by Regional Air Quality (RAQ) modellers world-wide. RAQ models need time-resolved meteorological as well as chemical lateral boundary conditions for their individual model domains. While the meteorological data usually come from well-established global forecast systems, the chemical boundary conditions are not always well defined. In the past, many models used 'climatic' boundary conditions for the tracer concentrations, which can lead to significant concentration biases, particularly for tracers with longer lifetimes which can be transported over long distances (e.g. over the whole northern hemisphere) with the mean wind. The Copernicus approach utilizes extensive near-realtime data assimilation of atmospheric composition data observed from space which gives additional reliability to the global modelling data and is well received by the RAQ communities. An existing Web Coverage Service (WCS) for sharing these individually tailored model results is currently being re-engineered to make use of a modern, scalable database technology in order to improve performance, enhance flexibility, and allow the operation of catalogue services. The new Jülich Atmospheric Data Distributions Server (JADDS) adheres to the Web Coverage Service WCS2.0 standard as defined by the Open Geospatial Consortium OGC. This enables the user groups to flexibly define datasets they need by selecting a subset of chemical species or restricting geographical boundaries or the length of the time series. The data is made available in the form of different catalogues stored locally on our server. In addition, the Jülich OWS Interface (JOIN) provides interoperable web services allowing for easy download and visualization of datasets delivered from WCS servers via the internet. We will present the prototype JADDS server and address the major issues identified when relocating large four-dimensional datasets into a RASDAMAN raster array database. So far the RASDAMAN support for data available in netCDF format is limited with respect to metadata related to variables and axes. For community-wide accepted solutions, selected data coverages shall result in downloadable netCDF files including metadata complying with the netCDF CF Metadata Conventions standard (http://cfconventions.org/). This can be achieved by adding custom metadata elements for RASDAMAN bands (model levels) on data ingestion. Furthermore, an optimization strategy for ingestion of several TB of 4D model output data will be outlined.
Semantic Metadata for Heterogeneous Spatial Planning Documents
NASA Astrophysics Data System (ADS)
Iwaniak, A.; Kaczmarek, I.; Łukowicz, J.; Strzelecki, M.; Coetzee, S.; Paluszyński, W.
2016-09-01
Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.
Automatic identification of high impact articles in PubMed to support clinical decision making.
Bian, Jiantao; Morid, Mohammad Amin; Jonnalagadda, Siddhartha; Luo, Gang; Del Fiol, Guilherme
2017-09-01
The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support. Copyright © 2017 Elsevier Inc. All rights reserved.
Amberg, Alexander; Barrett, Dave; Beale, Michael H.; Beger, Richard; Daykin, Clare A.; Fan, Teresa W.-M.; Fiehn, Oliver; Goodacre, Royston; Griffin, Julian L.; Hankemeier, Thomas; Hardy, Nigel; Harnly, James; Higashi, Richard; Kopka, Joachim; Lane, Andrew N.; Lindon, John C.; Marriott, Philip; Nicholls, Andrew W.; Reily, Michael D.; Thaden, John J.; Viant, Mark R.
2013-01-01
There is a general consensus that supports the need for standardized reporting of metadata or information describing large-scale metabolomics and other functional genomics data sets. Reporting of standard metadata provides a biological and empirical context for the data, facilitates experimental replication, and enables the re-interrogation and comparison of data by others. Accordingly, the Metabolomics Standards Initiative is building a general consensus concerning the minimum reporting standards for metabolomics experiments of which the Chemical Analysis Working Group (CAWG) is a member of this community effort. This article proposes the minimum reporting standards related to the chemical analysis aspects of metabolomics experiments including: sample preparation, experimental analysis, quality control, metabolite identification, and data pre-processing. These minimum standards currently focus mostly upon mass spectrometry and nuclear magnetic resonance spectroscopy due to the popularity of these techniques in metabolomics. However, additional input concerning other techniques is welcomed and can be provided via the CAWG on-line discussion forum at http://msi-workgroups.sourceforge.net/ or http://Msi-workgroups-feedback@lists.sourceforge.net. Further, community input related to this document can also be provided via this electronic forum. PMID:24039616
The ground truth about metadata and community detection in networks.
Peel, Leto; Larremore, Daniel B; Clauset, Aaron
2017-05-01
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
Collaborative Data Publication Utilizing the Open Data Repository's (ODR) Data Publisher
NASA Technical Reports Server (NTRS)
Stone, N.; Lafuente, B.; Bristow, T.; Keller, R. M.; Downs, R. T.; Blake, D.; Fonda, M.; Dateo, C.; Pires, A.
2017-01-01
Introduction: For small communities in diverse fields such as astrobiology, publishing and sharing data can be a difficult challenge. While large, homogenous fields often have repositories and existing data standards, small groups of independent researchers have few options for publishing standards and data that can be utilized within their community. In conjunction with teams at NASA Ames and the University of Arizona, the Open Data Repository's (ODR) Data Publisher has been conducting ongoing pilots to assess the needs of diverse research groups and to develop software to allow them to publish and share their data collaboratively. Objectives: The ODR's Data Publisher aims to provide an easy-to-use and implement software tool that will allow researchers to create and publish database templates and related data. The end product will facilitate both human-readable interfaces (web-based with embedded images, files, and charts) and machine-readable interfaces utilizing semantic standards. Characteristics: The Data Publisher software runs on the standard LAMP (Linux, Apache, MySQL, PHP) stack to provide the widest server base available. The software is based on Symfony (www.symfony.com) which provides a robust framework for creating extensible, object-oriented software in PHP. The software interface consists of a template designer where individual or master database templates can be created. A master database template can be shared by many researchers to provide a common metadata standard that will set a compatibility standard for all derivative databases. Individual researchers can then extend their instance of the template with custom fields, file storage, or visualizations that may be unique to their studies. This allows groups to create compatible databases for data discovery and sharing purposes while still providing the flexibility needed to meet the needs of scientists in rapidly evolving areas of research. Research: As part of this effort, a number of ongoing pilot and test projects are currently in progress. The Astrobiology Habitable Environments Database Working Group is developing a shared database standard using the ODR's Data Publisher and has a number of example databases where astrobiology data are shared. Soon these databases will be integrated via the template-based standard. Work with this group helps determine what data researchers in these diverse fields need to share and archive. Additionally, this pilot helps determine what standards are viable for sharing these types of data from internally developed standards to existing open standards such as the Dublin Core (http://dublincore.org) and Darwin Core (http://rs.twdg.org) metadata standards. Further studies are ongoing with the University of Arizona Department of Geosciences where a number of mineralogy databases are being constructed within the ODR Data Publisher system. Conclusions: Through the ongoing pilots and discussions with individual researchers and small research teams, a definition of the tools desired by these groups is coming into focus. As the software development moves forward, the goal is to meet the publication and collaboration needs of these scientists in an unobtrusive and functional way.
Architecture for the Interdisciplinary Earth Data Alliance
NASA Astrophysics Data System (ADS)
Richard, S. M.
2016-12-01
The Interdisciplinary Earth Data Alliance (IEDA) is leading an EarthCube (EC) Integrative Activity to develop a governance structure and technology framework that enables partner data systems to share technology, infrastructure, and practice for documenting, curating, and accessing heterogeneous geoscience data. The IEDA data facility provides capabilities in an extensible framework that enables domain-specific requirements for each partner system in the Alliance to be integrated into standardized cross-domain workflows. The shared technology infrastructure includes a data submission hub, a domain-agnostic file-based repository, an integrated Alliance catalog and a Data Browser for data discovery across all partner holdings, as well as services for registering identifiers for datasets (DOI) and samples (IGSN). The submission hub will be a platform that facilitates acquisition of cross-domain resource documentation and channels users into domain and resource-specific workflows tailored for each partner community. We are exploring an event-based message bus architecture with a standardized plug-in interface for adding capabilities. This architecture builds on the EC CINERGI metadata pipeline as well as the message-based architecture of the SEAD project. Plug-in components for file introspection to match entities to a data type registry (extending EC Digital Crust and Research Data Alliance work), extract standardized keywords (using CINERGI components), location, cruise, personnel and other metadata linkage information (building on GeoLink and existing IEDA partner components). The submission hub will feed submissions to appropriate partner repositories and service endpoints targeted by domain and resource type for distribution. The Alliance governance will adopt patterns (vocabularies, operations, resource types) for self-describing data services using standard HTTP protocol for simplified data access (building on EC GeoWS and other `RESTful' approaches). Exposure of resource descriptions (datasets and service distributions) for harvesting by commercial search engines as well as geoscience-data focused crawlers (like EC B-Cube crawler) will increase discoverability of IEDA resources with minimal effort by curators.
DIRAC File Replica and Metadata Catalog
NASA Astrophysics Data System (ADS)
Tsaregorodtsev, A.; Poss, S.
2012-12-01
File replica and metadata catalogs are essential parts of any distributed data management system, which are largely determining its functionality and performance. A new File Catalog (DFC) was developed in the framework of the DIRAC Project that combines both replica and metadata catalog functionality. The DFC design is based on the practical experience with the data management system of the LHCb Collaboration. It is optimized for the most common patterns of the catalog usage in order to achieve maximum performance from the user perspective. The DFC supports bulk operations for replica queries and allows quick analysis of the storage usage globally and for each Storage Element separately. It supports flexible ACL rules with plug-ins for various policies that can be adopted by a particular community. The DFC catalog allows to store various types of metadata associated with files and directories and to perform efficient queries for the data based on complex metadata combinations. Definition of file ancestor-descendent relation chains is also possible. The DFC catalog is implemented in the general DIRAC distributed computing framework following the standard grid security architecture. In this paper we describe the design of the DFC and its implementation details. The performance measurements are compared with other grid file catalog implementations. The experience of the DFC Catalog usage in the CLIC detector project are discussed.
ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond
NASA Astrophysics Data System (ADS)
van Gemmeren, P.; Cranshaw, J.; Malon, D.; Vaniachine, A.
2015-12-01
ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework's state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires further enhancement of metadata infrastructure in order to ensure semantic coherence and robust bookkeeping. This paper describes the evolution of ATLAS metadata infrastructure for Run 2 and beyond, including the transition to dual-use tools—tools that can operate inside or outside the ATLAS control framework—and the implications thereof. It further examines how the design of this infrastructure is changing to accommodate the requirements of future frameworks and emerging event processing architectures.
Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes
Sharma, Deepak K.; Solbrig, Harold R.; Prud’hommeaux, Eric; Pathak, Jyotishman; Jiang, Guoqian
2016-01-01
Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary’s metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration. PMID:28269909
Standardized Representation of Clinical Study Data Dictionaries with CIMI Archetypes.
Sharma, Deepak K; Solbrig, Harold R; Prud'hommeaux, Eric; Pathak, Jyotishman; Jiang, Guoqian
2016-01-01
Researchers commonly use a tabular format to describe and represent clinical study data. The lack of standardization of data dictionary's metadata elements presents challenges for their harmonization for similar studies and impedes interoperability outside the local context. We propose that representing data dictionaries in the form of standardized archetypes can help to overcome this problem. The Archetype Modeling Language (AML) as developed by the Clinical Information Modeling Initiative (CIMI) can serve as a common format for the representation of data dictionary models. We mapped three different data dictionaries (identified from dbGAP, PheKB and TCGA) onto AML archetypes by aligning dictionary variable definitions with the AML archetype elements. The near complete alignment of data dictionaries helped map them into valid AML models that captured all data dictionary model metadata. The outcome of the work would help subject matter experts harmonize data models for quality, semantic interoperability and better downstream data integration.
XML and E-Journals: The State of Play.
ERIC Educational Resources Information Center
Wusteman, Judith
2003-01-01
Discusses the introduction of the use of XML (Extensible Markup Language) in publishing electronic journals. Topics include standards, including DTDs (Document Type Definition), or document type definitions; aggregator requirements; SGML (Standard Generalized Markup Language); benefits of XML for e-journals; XML metadata; the possibility of…
NASA Astrophysics Data System (ADS)
Benedict, K. K.; Servilla, M. S.; Vanderbilt, K.; Wheeler, J.
2015-12-01
The growing volume, variety and velocity of production of Earth science data magnifies the impact of inefficiencies in data acquisition, processing, analysis, and sharing workflows, potentially to the point of impairing the ability of researchers to accomplish their desired scientific objectives. The adaptation of agile software development principles (http://agilemanifesto.org/principles.html) to data curation processes has significant potential to lower barriers to effective scientific data discovery and reuse - barriers that otherwise may force the development of new data to replace existing but unusable data, or require substantial effort to make data usable in new research contexts. This paper outlines a data curation process that was developed at the University of New Mexico that provides a cross-walk of data and associated documentation between the data archive developed by the Long Term Ecological Research (LTER) Network Office (PASTA - http://lno.lternet.edu/content/network-information-system) and UNM's institutional repository (LoboVault - http://repository.unm.edu). The developed automated workflow enables the replication of versioned data objects and their associated standards-based metadata between the LTER system and LoboVault - providing long-term preservation for those data/metadata packages within LoboVault while maintaining the value-added services that the PASTA platform provides. The relative ease with which this workflow was developed is a product of the capabilities independently developed on both platforms - including the simplicity of providing a well-documented application programming interface (API) for each platform enabling scripted interaction and the use of well-established documentation standards (EML in the case of PASTA, Dublin Core in the case of LoboVault) by both systems. These system characteristics, when combined with an iterative process of interaction between the Data Curation Librarian (on the LoboVault side of the process), the Sevilleta LTER Information Manager and the LTER Network Information System developer, yielded a rapid and relatively streamlined process for targeted replication of data and metadata between the two systems - increasing the discoverability and usability of the LTER data assets.
Metadata mapping and reuse in caBIG.
Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis
2009-02-05
This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories. The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository. This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
Towards structured sharing of raw and derived neuroimaging data across existing resources
Keator, D.B.; Helmer, K.; Steffener, J.; Turner, J.A.; Van Erp, T.G.M.; Gadde, S.; Ashish, N.; Burns, G.A.; Nichols, B.N.
2013-01-01
Data sharing efforts increasingly contribute to the acceleration of scientific discovery. Neuroimaging data is accumulating in distributed domain-specific databases and there is currently no integrated access mechanism nor an accepted format for the critically important meta-data that is necessary for making use of the combined, available neuroimaging data. In this manuscript, we present work from the Derived Data Working Group, an open-access group sponsored by the Biomedical Informatics Research Network (BIRN) and the International Neuroimaging Coordinating Facility (INCF) focused on practical tools for distributed access to neuroimaging data. The working group develops models and tools facilitating the structured interchange of neuroimaging meta-data and is making progress towards a unified set of tools for such data and meta-data exchange. We report on the key components required for integrated access to raw and derived neuroimaging data as well as associated meta-data and provenance across neuroimaging resources. The components include (1) a structured terminology that provides semantic context to data, (2) a formal data model for neuroimaging with robust tracking of data provenance, (3) a web service-based application programming interface (API) that provides a consistent mechanism to access and query the data model, and (4) a provenance library that can be used for the extraction of provenance data by image analysts and imaging software developers. We believe that the framework and set of tools outlined in this manuscript have great potential for solving many of the issues the neuroimaging community faces when sharing raw and derived neuroimaging data across the various existing database systems for the purpose of accelerating scientific discovery. PMID:23727024
Data Curation for the Exploitation of Large Earth Observation Products Databases - The MEA system
NASA Astrophysics Data System (ADS)
Mantovani, Simone; Natali, Stefano; Barboni, Damiano; Cavicchi, Mario; Della Vecchia, Andrea
2014-05-01
National Space Agencies under the umbrella of the European Space Agency are performing a strong activity to handle and provide solutions to Big Data and related knowledge (metadata, software tools and services) management and exploitation. The continuously increasing amount of long-term and of historic data in EO facilities in the form of online datasets and archives, the incoming satellite observation platforms that will generate an impressive amount of new data and the new EU approach on the data distribution policy make necessary to address technologies for the long-term management of these data sets, including their consolidation, preservation, distribution, continuation and curation across multiple missions. The management of long EO data time series of continuing or historic missions - with more than 20 years of data available already today - requires technical solutions and technologies which differ considerably from the ones exploited by existing systems. Several tools, both open source and commercial, are already providing technologies to handle data and metadata preparation, access and visualization via OGC standard interfaces. This study aims at describing the Multi-sensor Evolution Analysis (MEA) system and the Data Curation concept as approached and implemented within the ASIM and EarthServer projects, funded by the European Space Agency and the European Commission, respectively.
Brühschwein, Andreas; Klever, Julius; Wilkinson, Tom; Meyer-Lindenberg, Andrea
2018-02-01
In 2016, the recommendations of the DICOM Standards Committee for the use of veterinary identification DICOM tags had its 10th anniversary. The goal of our study was to survey veterinary DICOM standard conformance in Germany regarding the specific identification tags veterinarians should use in veterinary diagnostic imaging. We hypothesized that most veterinarians in Germany do not follow the guidelines of the DICOM Standards Committee. We analyzed the metadata of 488 imaging studies of referral cases from 115 different veterinary institutions in Germany by computer-aided DICOM header readout. We found that 25 (5.1%) of the imaging studies fully complied with the "veterinary DICOM standard" in this survey. The results confirmed our hypothesis that the recommendations of the DICOM Standards Committee for the consistent and advantageous use of veterinary identification tags have found minimal acceptance amongst German veterinarians. DICOM does not only enable connectivity between machines, DICOM also improves communication between veterinarians by sharing correct and valuable metadata for better patient care. Therefore, we recommend that lecturers, universities, societies, authorities, vendors, and other stakeholders should increase their effort to improve the spread of the veterinary DICOM standard in the veterinary world.
Salek, Reza M; Neumann, Steffen; Schober, Daniel; Hummel, Jan; Billiau, Kenny; Kopka, Joachim; Correa, Elon; Reijmers, Theo; Rosato, Antonio; Tenori, Leonardo; Turano, Paola; Marin, Silvia; Deborde, Catherine; Jacob, Daniel; Rolin, Dominique; Dartigues, Benjamin; Conesa, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; O'Hagan, Steve; Hao, Jie; van Vliet, Michael; Sysi-Aho, Marko; Ludwig, Christian; Bouwman, Jildau; Cascante, Marta; Ebbels, Timothy; Griffin, Julian L; Moing, Annick; Nikolski, Macha; Oresic, Matej; Sansone, Susanna-Assunta; Viant, Mark R; Goodacre, Royston; Günther, Ulrich L; Hankemeier, Thomas; Luchinat, Claudio; Walther, Dirk; Steinbeck, Christoph
Metabolomics has become a crucial phenotyping technique in a range of research fields including medicine, the life sciences, biotechnology and the environmental sciences. This necessitates the transfer of experimental information between research groups, as well as potentially to publishers and funders. After the initial efforts of the metabolomics standards initiative, minimum reporting standards were proposed which included the concepts for metabolomics databases. Built by the community, standards and infrastructure for metabolomics are still needed to allow storage, exchange, comparison and re-utilization of metabolomics data. The Framework Programme 7 EU Initiative 'coordination of standards in metabolomics' (COSMOS) is developing a robust data infrastructure and exchange standards for metabolomics data and metadata. This is to support workflows for a broad range of metabolomics applications within the European metabolomics community and the wider metabolomics and biomedical communities' participation. Here we announce our concepts and efforts asking for re-engagement of the metabolomics community, academics and industry, journal publishers, software and hardware vendors, as well as those interested in standardisation worldwide (addressing missing metabolomics ontologies, complex-metadata capturing and XML based open source data exchange format), to join and work towards updating and implementing metabolomics standards.
Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S
2016-10-01
Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.
Discovering Physical Samples Through Identifiers, Metadata, and Brokering
NASA Astrophysics Data System (ADS)
Arctur, D. K.; Hills, D. J.; Jenkyns, R.
2015-12-01
Physical samples, particularly in the geosciences, are key to understanding the Earth system, its history, and its evolution. Our record of the Earth as captured by physical samples is difficult to explain and mine for understanding, due to incomplete, disconnected, and evolving metadata content. This is further complicated by differing ways of classifying, cataloguing, publishing, and searching the metadata, especially when specimens do not fit neatly into a single domain—for example, fossils cross disciplinary boundaries (mineral and biological). Sometimes even the fundamental classification systems evolve, such as the geological time scale, triggering daunting processes to update existing specimen databases. Increasingly, we need to consider ways of leveraging permanent, unique identifiers, as well as advancements in metadata publishing that link digital records with physical samples in a robust, adaptive way. An NSF EarthCube Research Coordination Network (RCN) called the Internet of Samples (iSamples) is now working to bridge the metadata schemas for biological and geological domains. We are leveraging the International Geo Sample Number (IGSN) that provides a versatile system of registering physical samples, and working to harmonize this with the DataCite schema for Digital Object Identifiers (DOI). A brokering approach for linking disparate catalogues and classification systems could help scale discovery and access to the many large collections now being managed (sometimes millions of specimens per collection). This presentation is about our community building efforts, research directions, and insights to date.
Collaborative Movie Annotation
NASA Astrophysics Data System (ADS)
Zad, Damon Daylamani; Agius, Harry
In this paper, we focus on metadata for self-created movies like those found on YouTube and Google Video, the duration of which are increasing in line with falling upload restrictions. While simple tags may have been sufficient for most purposes for traditionally very short video footage that contains a relatively small amount of semantic content, this is not the case for movies of longer duration which embody more intricate semantics. Creating metadata is a time-consuming process that takes a great deal of individual effort; however, this effort can be greatly reduced by harnessing the power of Web 2.0 communities to create, update and maintain it. Consequently, we consider the annotation of movies within Web 2.0 environments, such that users create and share that metadata collaboratively and propose an architecture for collaborative movie annotation. This architecture arises from the results of an empirical experiment where metadata creation tools, YouTube and an MPEG-7 modelling tool, were used by users to create movie metadata. The next section discusses related work in the areas of collaborative retrieval and tagging. Then, we describe the experiments that were undertaken on a sample of 50 users. Next, the results are presented which provide some insight into how users interact with existing tools and systems for annotating movies. Based on these results, the paper then develops an architecture for collaborative movie annotation.
Distributed Learning Metadata Standards
ERIC Educational Resources Information Center
McClelland, Marilyn
2004-01-01
Significant economies can be achieved in distributed learning systems architected with a focus on interoperability and reuse. The key building blocks of an efficient distributed learning architecture are the use of standards and XML technologies. The goal of plug and play capability among various components of a distributed learning system…
OAI and NASA's Scientific and Technical Information
NASA Technical Reports Server (NTRS)
Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.
2002-01-01
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an evolving protocol and philosophy regarding interoperability for digital libraries (DLs). Previously, "distributed searching" models were popular for DL interoperability. However, experience has shown distributed searching systems across large numbers of DLs to be difficult to maintain in an Internet environment. The OAI-PMH is a move away from distributed searching, focusing on the arguably simpler model of "metadata harvesting". We detail NASA s involvement in defining and testing the OAI-PMH and experience to date with adapting existing NASA distributed searching DLs (such as the NASA Technical Report Server) to use the OAI-PMH and metadata harvesting. We discuss some of the entirely new DL projects that the OAI-PMH has made possible, such as the Technical Report Interchange project. We explain the strategic importance of the OAI-PMH to the mission of NASA s Scientific and Technical Information Program.
Best Practices for Searchable Collection Pages
Searchable Collection pages are stand-alone documents that do not have any web area navigation. They should not recreate existing content on other sites and should be tagged with quality metadata and taxonomy terms.
Making EPA's PDF documents accessible (by Section 508 standards) and user-friendly includes steps such as adding bookmarks, using electronic conversion rather than scanning pages, and adding metadata.
Standards-based curation of a decade-old digital repository dataset of molecular information.
Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Murray-Rust, Peter; Rzepa, Henry S; Stewart, James J P
2015-01-01
The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.
Martin, Erika G; Law, Jennie; Ran, Weijia; Helbig, Natalie; Birkhead, Guthrie S
Government datasets are newly available on open data platforms that are publicly accessible, available in nonproprietary formats, free of charge, and with unlimited use and distribution rights. They provide opportunities for health research, but their quality and usability are unknown. To describe available open health data, identify whether data are presented in a way that is aligned with best practices and usable for researchers, and examine differences across platforms. Two reviewers systematically reviewed a random sample of data offerings on NYC OpenData (New York City, all offerings, n = 37), Health Data NY (New York State, 25% sample, n = 71), and HealthData.gov (US Department of Health and Human Services, 5% sample, n = 75), using a standard coding guide. Three open health data platforms at the federal, New York State, and New York City levels. Data characteristics from the coding guide were aggregated into summary indices for intrinsic data quality, contextual data quality, adherence to the Dublin Core metadata standards, and the 5-star open data deployment scheme. One quarter of the offerings were structured datasets; other presentation styles included charts (14.7%), documents describing data (12.0%), maps (10.9%), and query tools (7.7%). Health Data NY had higher intrinsic data quality (P < .001), contextual data quality (P < .001), and Dublin Core metadata standards adherence (P < .001). All met basic "web availability" open data standards; fewer met higher standards of "hyperlinked to other data." Although all platforms need improvement, they already provide readily available data for health research. Sustained effort on improving open data websites and metadata is necessary for ensuring researchers use these data, thereby increasing their research value.
ISA-TAB-Nano: a specification for sharing nanomaterial research data in spreadsheet-based format.
Thomas, Dennis G; Gaheen, Sharon; Harper, Stacey L; Fritts, Martin; Klaessig, Fred; Hahn-Dantona, Elizabeth; Paik, David; Pan, Sue; Stafford, Grace A; Freund, Elaine T; Klemm, Juli D; Baker, Nathan A
2013-01-14
The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption.
ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format
2013-01-01
Background and motivation The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. Results We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. Conclusion The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption. PMID:23311978
Geophysical Event Casting: Assembling & Broadcasting Data Relevant to Events and Disasters
NASA Astrophysics Data System (ADS)
Manipon, G. M.; Wilson, B. D.
2012-12-01
Broadcast Atom feeds are already being used to publish metadata and support discovery of data collections, granules, and web services. Such data and service casting advertises the existence of new granules in a dataset and available services to access or transform data. Similarly, data and services relevant to studying topical geophysical events (earthquakes, hurricanes, etc.) or periodic/regional structures (El Nino, deep convection) can be broadcast by publishing new entries and links in a feed for that topic. By using the geoRSS conventions, the time and space location of the event (e.g. a moving hurricane track) is specified in the feed, along with science description, images, relevant data granules, and links to useful web services (e.g. OGC/WMS). The topic cast is used to assemble all of the relevant data/images as they come in, and publish the metadata (images, links, services) to a broad group of subscribers. All of the information in the feed is structured using standardized XML tags (e.g. georss for space & time, and tags to point to external data & services), and is thus machine-readable, which is an improvement over collecting ad hoc links on a wiki. We have created a software suite in python to generate such "event casts" when a geophysical event first happens, then update them with more information as it becomes available, and display them as an event album in a web browser. Figure 1 shows a snapshot of our Event Cast Browser displaying information from a set of casts about the hurricanes in the Western Pacific during the year 2011. The 19th cyclone is selected in the left panel, so the top right panels display the entries in that feed with metadata such as maximum wind speed, while the bottom right panel displays the hurricane track (positions every 12 hours) as KML in the Google Earth plug-in, where additional data/image layers from the feed can be turned on or off by the user. The software automatically converts (georss) space & time information to KML placemarks, and can also generate various KML visualizations for other data layers that are pointed to in the feed. The user can replay all of the data images as an animation over the several days as the cyclone develops. The goal of "event casting" is to standardize several metadata micro-formats and use them within Atom feeds to create a rich ecosystem of topical event data that can be automatically manipulated by scripts and many interfaces. For our event cast browser, the same code can display all kinds of casts, whether about hurricanes, fire, earthquakes, or even El Nino. The presentation will describe: the event cast format and its standard micro-formats, software to generate and augment casts, and the browser GUI with KML visualizations.;
The ground truth about metadata and community detection in networks
Peel, Leto; Larremore, Daniel B.; Clauset, Aaron
2017-01-01
Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. PMID:28508065
Evolving the Living With a Star Data System Definition
NASA Astrophysics Data System (ADS)
Otranto, J. F.; Dijoseph, M.
2003-12-01
NASA's Living With a Star (LWS) Program is a space weather-focused and applications-driven research program. The LWS Program is soliciting input from the solar, space physics, space weather, and climate science communities to develop a system that enables access to science data associated with these disciplines, and advances the development of discipline and interdisciplinary findings. The LWS Program will implement a data system that builds upon the existing and planned data capture, processing, and storage components put in place by individual spacecraft missions and also inter-project data management systems, including active and deep archives, and multi-mission data repositories. It is technically feasible for the LWS Program to integrate data from a broad set of resources, assuming they are either publicly accessible or allow access by permission. The LWS Program data system will work in coordination with spacecraft mission data systems and science data repositories, integrating their holdings using a common metadata representation. This common representation relies on a robust metadata definition that provides journalistic and technical data descriptions, plus linkages to supporting data products and tools. The LWS Program intends to become an enabling resource to PIs, interdisciplinary scientists, researchers, and students facilitating both access to a broad collection of science data, as well as the necessary supporting components to understand and make productive use of these data. For the LWS Program to represent science data that are physically distributed across various ground system elements, information will be collected about these distributed data products through a series of LWS Program-created agents. These agents will be customized to interface or interact with each one of these data systems, collect information, and forward any new metadata records to a LWS Program-developed metadata library. A populated LWS metadata library will function as a single point-of-contact that serves the entire science community as a first stop for data availability, whether or not science data are physically stored in an LWS-operated repository. Further, this metadata library will provide the user access to information for understanding these data including descriptions of the associated spacecraft and instrument, data format, calibration and operations issues, links to ancillary and correlative data products, links to processing tools and models associated with these data, and any corresponding findings produced using these data. The LWS may also support an active archive for solar, space physics, space weather, and climate data when these data would otherwise be discarded or archived off-line. This archive could potentially serve also as a data storage backup facility for LWS missions. The plan for the LWS Program metadata library is developed based upon input received from the solar and geospace science communities; the library's architecture is based on existing systems developed for serving science metadata. The LWS Program continues to seek constructive input from the science community, examples of both successes and failures in dealing with science data systems, and insights regarding the obstacles between the current state-of-the-practice and this vision for the LWS Program metadata library.
The MPO system for automatic workflow documentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abla, G.; Coviello, E. N.; Flanagan, S. M.
Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
The MPO system for automatic workflow documentation
Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...
2016-04-18
Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
NASA Astrophysics Data System (ADS)
Baru, C.; Lin, K.
2009-04-01
The Geosciences Network project (www.geongrid.org) has been developing cyberinfrastructure for data sharing in the Earth Science community based on a service-oriented architecture. The project defines a standard "software stack", which includes a standardized set of software modules and corresponding service interfaces. The system employs Grid certificates for distributed user authentication. The GEON Portal provides online access to these services via a set of portlets. This service-oriented approach has enabled the GEON network to easily expand to new sites and deploy the same infrastructure in new projects. To facilitate interoperation with other distributed geoinformatics environments, service standards are being defined and implemented for catalog services and federated search across distributed catalogs. The need arises because there may be multiple metadata catalogs in a distributed system, for example, for each institution, agency, geographic region, and/or country. Ideally, a geoinformatics user should be able to search across all such catalogs by making a single search request. In this paper, we describe our implementation for such a search capability across federated metadata catalogs in the GEON service-oriented architecture. The GEON catalog can be searched using spatial, temporal, and other metadata-based search criteria. The search can be invoked as a Web service and, thus, can be imbedded in any software application. The need for federated catalogs in GEON arises because, (i) GEON collaborators at the University of Hyderabad, India have deployed their own catalog, as part of the iGEON-India effort, to register information about local resources for broader access across the network, (ii) GEON collaborators in the GEO Grid (Global Earth Observations Grid) project at AIST, Japan have implemented a catalog for their ASTER data products, and (iii) we have recently deployed a search service to access all data products from the EarthScope project in the US (http://es-portal.geongrid.org), which are distributed across data archives at IRIS in Seattle, Washington, UNAVCO in Boulder, Colorado, and at the ICDP archives in GFZ, Potsdam, Germany. This service implements a "virtual" catalog--the actual/"physical" catalogs and data are stored at each of the remote locations. A federated search across all these catalogs would enable GEON users to discover data across all of these environments with a single search request. Our objective is to implement this search service via the OGC Catalog Services for the Web (CS-W) standard by providing appropriate CSW "wrappers" for each metadata catalog, as necessary. This paper will discuss technical issues in designing and deploying such a multi-catalog search service in GEON and describe an initial prototype of the federated search capability.
The Connection Between Solar Coronal Cavities and Solar Filaments
NASA Astrophysics Data System (ADS)
Zawadzki, B.; Karna, N.; Prchlik, J.; Reeves, K.; Kempton, D.; Angryk, R.
2017-12-01
Filaments are structures in the solar corona made up of relatively cool, dense, partially ionized plasma. Coronal cavities, circular or elliptical regions of low plasma density, are observed above prominences on the solar limb when viewed in EUV and white light coronal images. Since most filament/cavity eruptions lead to a coronal mass ejection (CME), determining the likelihood of an eruption event will improve our ability to predict space weather. We examine SDO/AIA cavity metadata and HEK filament metadata to determine which cavities are associated with which filaments from 2012 to 2015. Our study involved 140 cavities and 368 filaments that appeared poleward of +-30 degrees. We categorized the cavities and filaments based on the stability of the structures, defined by whether or not the cavity and filament exist long enough to track fully across the solar disk. Using these categories we perform a statistical study on various filament qualities within the metadata. Our findings indicate that filaments with cavities are observed more often at high latitude in compared to filaments without cavities. Moreover, our study indicates that a statistically significant difference exists between the filament length and tilt distributions for certain categories. This work supported by the NSF-REU solar physics program at SAO, grant number AGS-1560313, and the NSF-DIBBS project, grant number ACI-1443061.
CytometryML binary data standards
NASA Astrophysics Data System (ADS)
Leif, Robert C.
2005-03-01
CytometryML is a proposed new Analytical Cytology (Cytomics) data standard, which is based on a common set of XML schemas for encoding flow cytometry and digital microscopy text based data types (metadata). CytometryML schemas reference both DICOM (Digital Imaging and Communications in Medicine) codes and FCS keywords. Flow Cytometry Standard (FCS) list-mode has been mapped to the DICOM Waveform Information Object. The separation of the large binary data objects (list mode and image data) from the XML description of the metadata permits the metadata to be directly displayed, analyzed, and reported with standard commercial software packages; the direct use of XML languages; and direct interfacing with clinical information systems. The separation of the binary data into its own files simplifies parsing because all extraneous header data has been eliminated. The storage of images as two-dimensional arrays without any extraneous data, such as in the Adobe Photoshop RAW format, facilitates the development by scientists of their own analysis and visualization software. Adobe Photoshop provided the display infrastructure and the translation facility to interconvert between the image data from commercial formats and RAW format. Similarly, the storage and parsing of list mode binary data type with a group of parameters that are specified at compilation time is straight forward. However when the user is permitted at run-time to select a subset of the parameters and/or specify results of mathematical manipulations, the development of special software was required. The use of CytometryML will permit investigators to be able to create their own interoperable data analysis software and to employ commercially available software to disseminate their data.
Hancock, David; Wilson, Michael; Velarde, Giles; Morrison, Norman; Hayes, Andrew; Hulme, Helen; Wood, A Joseph; Nashar, Karim; Kell, Douglas B; Brass, Andy
2005-11-03
maxdLoad2 is a relational database schema and Java application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool. maxdLoad2 presents an easy-to-use interface to an underlying relational database and provides a full complement of facilities for browsing, searching and editing. There is a tree-based visualization of data connectivity and the ability to explore the links between any pair of data elements, irrespective of how many intermediate links lie between them. Its principle novel features are: the flexibility of the meta-data that can be captured, the tools provided for importing data from spreadsheets and other tabular representations, the tools provided for the automatic creation of structured documents, the ability to browse and access the data via web and web-services interfaces. Within maxdLoad2 it is very straightforward to customise the meta-data that is being captured or change the definitions of the meta-data. These meta-data definitions are stored within the database itself allowing client software to connect properly to a modified database without having to be specially configured. The meta-data definitions (configuration file) can also be centralized allowing changes made in response to revisions of standards or terminologies to be propagated to clients without user intervention.maxdBrowse is hosted on a web-server and presents multiple interfaces to the contents of maxd databases. maxdBrowse emulates many of the browse and search features available in the maxdLoad2 application via a web-browser. This allows users who are not familiar with maxdLoad2 to browse and export microarray data from the database for their own analysis. The same browse and search features are also available via command-line and SOAP server interfaces. This both enables scripting of data export for use embedded in data repositories and analysis environments, and allows access to the maxd databases via web-service architectures. maxdLoad2 http://www.bioinf.man.ac.uk/microarray/maxd/ and maxdBrowse http://dbk.ch.umist.ac.uk/maxdBrowse are portable and compatible with all common operating systems and major database servers. They provide a powerful, flexible package for annotation of microarray experiments and a convenient dissemination environment. They are available for download and open sourced under the Artistic License.
NASA Astrophysics Data System (ADS)
Palanisamy, Giriprakash; Wilson, Bruce E.; Cook, Robert B.; Lenhardt, Chris W.; Santhana Vannan, Suresh; Pan, Jerry; McMurry, Ben F.; Devarakonda, Ranjeet
2010-12-01
The Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) is one of the science-oriented data centers in EOSDIS, aligned primarily with terrestrial ecology. The ORNL DAAC archives and serves data from NASA-funded field campaigns (such as BOREAS, FIFE, and LBA), regional and global data sets relevant to biogeochemical cycles, land validation studies for remote sensing, and source code for some terrestrial ecology models. Users of the ORNL DAAC include field ecologists, remote sensing scientists, modelers at various scales, synthesis scientific groups, a range of educational users (particularly baccalaureate and graduate instruction), and decision support analysts. It is clear that the wide range of users served by the ORNL DAAC have differing needs and differing capabilities for accessing and using data. It is also not possible for the ORNL DAAC, or the other data centers in EDSS to develop all of the tools and interfaces to support even most of the potential uses of data directly. As is typical of Information Technology to support a research enterprise, the user needs will continue to evolve rapidly over time and users themselves cannot predict future needs, as those needs depend on the results of current investigation. The ORNL DAAC is addressing these needs by targeted implementation of web services and tools which can be consumed by other applications, so that a modeler can retrieve data in netCDF format with the Climate Forecasting convention and a field ecologist can retrieve subsets of that same data in a comma separated value format, suitable for use in Excel or R. Tools such as our MODIS Subsetting capability, the Spatial Data Access Tool (SDAT; based on OGC web services), and OPeNDAP-compliant servers such as THREDDS particularly enable such diverse means of access. We also seek interoperability of metadata, recognizing that terrestrial ecology is a field where there are a very large number of relevant data repositories. ORNL DAAC metadata is published to several metadata repositories using the Open Archive Initiative Protocol for Metadata Handling (OAI-PMH), to increase the chances that users can find data holdings relevant to their particular scientific problem. ORNL also seeks to leverage technology across these various data projects and encourage standardization of processes and technical architecture. This standardization is behind current efforts involving the use of Drupal and Fedora Commons. This poster describes the current and planned approaches that the ORNL DAAC is taking to enable cost-effective interoperability among data centers, both across the NASA EOSDIS data centers and across the international spectrum of terrestrial ecology-related data centers. The poster will highlight the standards that we are currently using across data formats, metadata formats, and data protocols. References: [1]Devarakonda R., et al. Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics (2010), 3(1): 87-94. [2]Devarakonda R., et al. Data sharing and retrieval using OAI-PMH. Earth Science Informatics (2011), 4(1): 1-5.
Data Publishing and Sharing Via the THREDDS Data Repository
NASA Astrophysics Data System (ADS)
Wilson, A.; Caron, J.; Davis, E.; Baltzer, T.
2007-12-01
The terms "Team Science" and "Networked Science" have been coined to describe a virtual organization of researchers tied via some intellectual challenge, but often located in different organizations and locations. A critical component to these endeavors is publishing and sharing of content, including scientific data. Imagine pointing your web browser to a web page that interactively lets you upload data and metadata to a repository residing on a remote server, which can then be accessed by others in a secure fasion via the web. While any content can be added to this repository, it is designed particularly for storing and sharing scientific data and metadata. Server support includes uploading of data files that can subsequently be subsetted, aggregrated, and served in NetCDF or other scientific data formats. Metadata can be associated with the data and interactively edited. The THREDDS Data Repository (TDR) is a server that provides client initiated, on demand, location transparent storage for data of any type that can then be served by the THREDDS Data Server (TDS). The TDR provides functionality to: * securely store and "own" data files and associated metadata * upload files via HTTP and gridftp * upload a collection of data as single file * modify and restructure repository contents * incorporate metadata provided by the user * generate additional metadata programmatically * edit individual metadata elements The TDR can exist separately from a TDS, serving content via HTTP. Also, it can work in conjunction with the TDS, which includes functionality to provide: * access to data in a variety of formats via -- OPeNDAP -- OGC Web Coverage Service (for gridded datasets) -- bulk HTTP file transfer * a NetCDF view of datasets in NetCDF, OPeNDAP, HDF-5, GRIB, and NEXRAD formats * serving of very large volume datasets, such as NEXRAD radar * aggregation into virtual datasets * subsetting via OPeNDAP and NetCDF Subsetting services This talk will discuss TDR/TDS capabilities as well as how users can install this software to create their own repositories.
Kissling, W Daniel; Ahumada, Jorge A; Bowser, Anne; Fernandez, Miguel; Fernández, Néstor; García, Enrique Alonso; Guralnick, Robert P; Isaac, Nick J B; Kelling, Steve; Los, Wouter; McRae, Louise; Mihoub, Jean-Baptiste; Obst, Matthias; Santamaria, Monica; Skidmore, Andrew K; Williams, Kristen J; Agosti, Donat; Amariles, Daniel; Arvanitidis, Christos; Bastin, Lucy; De Leo, Francesca; Egloff, Willi; Elith, Jane; Hobern, Donald; Martin, David; Pereira, Henrique M; Pesole, Graziano; Peterseil, Johannes; Saarenmaa, Hannu; Schigel, Dmitry; Schmeller, Dirk S; Segata, Nicola; Turak, Eren; Uhlir, Paul F; Wee, Brian; Hardisty, Alex R
2018-02-01
Much biodiversity data is collected worldwide, but it remains challenging to assemble the scattered knowledge for assessing biodiversity status and trends. The concept of Essential Biodiversity Variables (EBVs) was introduced to structure biodiversity monitoring globally, and to harmonize and standardize biodiversity data from disparate sources to capture a minimum set of critical variables required to study, report and manage biodiversity change. Here, we assess the challenges of a 'Big Data' approach to building global EBV data products across taxa and spatiotemporal scales, focusing on species distribution and abundance. The majority of currently available data on species distributions derives from incidentally reported observations or from surveys where presence-only or presence-absence data are sampled repeatedly with standardized protocols. Most abundance data come from opportunistic population counts or from population time series using standardized protocols (e.g. repeated surveys of the same population from single or multiple sites). Enormous complexity exists in integrating these heterogeneous, multi-source data sets across space, time, taxa and different sampling methods. Integration of such data into global EBV data products requires correcting biases introduced by imperfect detection and varying sampling effort, dealing with different spatial resolution and extents, harmonizing measurement units from different data sources or sampling methods, applying statistical tools and models for spatial inter- or extrapolation, and quantifying sources of uncertainty and errors in data and models. To support the development of EBVs by the Group on Earth Observations Biodiversity Observation Network (GEO BON), we identify 11 key workflow steps that will operationalize the process of building EBV data products within and across research infrastructures worldwide. These workflow steps take multiple sequential activities into account, including identification and aggregation of various raw data sources, data quality control, taxonomic name matching and statistical modelling of integrated data. We illustrate these steps with concrete examples from existing citizen science and professional monitoring projects, including eBird, the Tropical Ecology Assessment and Monitoring network, the Living Planet Index and the Baltic Sea zooplankton monitoring. The identified workflow steps are applicable to both terrestrial and aquatic systems and a broad range of spatial, temporal and taxonomic scales. They depend on clear, findable and accessible metadata, and we provide an overview of current data and metadata standards. Several challenges remain to be solved for building global EBV data products: (i) developing tools and models for combining heterogeneous, multi-source data sets and filling data gaps in geographic, temporal and taxonomic coverage, (ii) integrating emerging methods and technologies for data collection such as citizen science, sensor networks, DNA-based techniques and satellite remote sensing, (iii) solving major technical issues related to data product structure, data storage, execution of workflows and the production process/cycle as well as approaching technical interoperability among research infrastructures, (iv) allowing semantic interoperability by developing and adopting standards and tools for capturing consistent data and metadata, and (v) ensuring legal interoperability by endorsing open data or data that are free from restrictions on use, modification and sharing. Addressing these challenges is critical for biodiversity research and for assessing progress towards conservation policy targets and sustainable development goals. © 2017 The Authors. Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
NASA Astrophysics Data System (ADS)
Moroni, D. F.; Armstrong, E. M.; Tauer, E.; Hausman, J.; Huang, T.; Thompson, C. K.; Chung, N.
2013-12-01
The Physical Oceanographic Distributed Active Archive Center (PO.DAAC) is one of 12 data centers sponsored by NASA's Earth Science Data and Information System (ESDIS) project. The PO.DAAC is tasked with archival and distribution of NASA Earth science missions specific to physical oceanography, many of which have interdisciplinary applications for weather forecasting/monitoring, ocean biology, ocean modeling, and climate studies. PO.DAAC has a 20-year history of cross-project and international collaborations with partners in Europe, Japan, Australia, and the UK. Domestically, the PO.DAAC has successfully established lasting partners with non-NASA institutions and projects including the National Oceanic and Atmospheric Administration (NOAA), United States Navy, Remote Sensing Systems, and Unidata. A key component of these partnerships is PO.DAAC's direct involvement with international working groups and science teams, such as the Group for High Resolution Sea Surface Temperature (GHRSST), International Ocean Vector Winds Science Team (IOVWST), Ocean Surface Topography Science Team (OSTST), and the Committee on Earth Observing Satellites (CEOS). To help bolster new and existing collaborations, the PO.DAAC has established a standardized approach to its internal Data Management and Archiving System (DMAS), utilizing a Data Dictionary to provide the baseline standard for entry and capture of dataset and granule metadata. Furthermore, the PO.DAAC has established an end-to-end Dataset Lifecycle Policy, built upon both internal and external recommendations of best practices toward data stewardship. Together, DMAS, the Data Dictionary, and the Dataset Lifecycle Policy provide the infrastructure to enable standardized data and metadata to be fully ingested and harvested to facilitate interoperability and compatibility across data access protocols, tools, and services. The Dataset Lifecycle Policy provides the checks and balances to help ensure all incoming HDF and netCDF-based datasets meet minimum compliance requirements with the Lawrence Livermore National Laboratory's actively maintained Climate and Forecast (CF) conventions with additional goals toward metadata standards provided by the Attribute Convention for Dataset Discovery (ACDD), the International Organization for Standardization (ISO) 19100-series, and the Federal Geographic Data Committee (FGDC). By default, DMAS ensures all datasets are compliant with NASA's Global Change Master Directory (GCMD) and NASA's Reverb data discovery clearinghouse (also known as ECHO). For data access, PO.DAAC offers several widely-used technologies, including File Transfer Protocol (FTP), Open-source Project for a Network Data Access Protocol (OPeNDAP), and Thematic Realtime Environmental Distributed Data Services (THREDDS). These access technologies are available directly to users or through PO.DAAC's web interfaces, specifically the High-level Tool for Interactive Data Extraction (HiTIDE), Live Access Server (LAS), and PO.DAAC's set of search, image, and Consolidated Web Services (CWS). Lastly, PO.DAAC's newly introduced, standards-based CWS provide singular endpoints for search, imaging, and extraction capabilities, respectively, across L2/L3/L4 datasets. Altogether, these tools, services and policies serve to provide flexible, interoperable functionality for both users and data providers.
Hume, Sam; Aerts, Jozef; Sarnikar, Surendra; Huser, Vojtech
2016-04-01
In order to further advance research and development on the Clinical Data Interchange Standards Consortium (CDISC) Operational Data Model (ODM) standard, the existing research must be well understood. This paper presents a methodological review of the ODM literature. Specifically, it develops a classification schema to categorize the ODM literature according to how the standard has been applied within the clinical research data lifecycle. This paper suggests areas for future research and development that address ODM's limitations and capitalize on its strengths to support new trends in clinical research informatics. A systematic scan of the following databases was performed: (1) ABI/Inform, (2) ACM Digital, (3) AIS eLibrary, (4) Europe Central PubMed, (5) Google Scholar, (5) IEEE Xplore, (7) PubMed, and (8) ScienceDirect. A Web of Science citation analysis was also performed. The search term used on all databases was "CDISC ODM." The two primary inclusion criteria were: (1) the research must examine the use of ODM as an information system solution component, or (2) the research must critically evaluate ODM against a stated solution usage scenario. Out of 2686 articles identified, 266 were included in a title level review, resulting in 183 articles. An abstract review followed, resulting in 121 remaining articles; and after a full text scan 69 articles met the inclusion criteria. As the demand for interoperability has increased, ODM has shown remarkable flexibility and has been extended to cover a broad range of data and metadata requirements that reach well beyond ODM's original use cases. This flexibility has yielded research literature that covers a diverse array of topic areas. A classification schema reflecting the use of ODM within the clinical research data lifecycle was created to provide a categorized and consolidated view of the ODM literature. The elements of the framework include: (1) EDC (Electronic Data Capture) and EHR (Electronic Health Record) infrastructure; (2) planning; (3) data collection; (4) data tabulations and analysis; and (5) study archival. The analysis reviews the strengths and limitations of ODM as a solution component within each section of the classification schema. This paper also identifies opportunities for future ODM research and development, including improved mechanisms for semantic alignment with external terminologies, better representation of the CDISC standards used end-to-end across the clinical research data lifecycle, improved support for real-time data exchange, the use of EHRs for research, and the inclusion of a complete study design. ODM is being used in ways not originally anticipated, and covers a diverse array of use cases across the clinical research data lifecycle. ODM has been used as much as a study metadata standard as it has for data exchange. A significant portion of the literature addresses integrating EHR and clinical research data. The simplicity and readability of ODM has likely contributed to its success and broad implementation as a data and metadata standard. Keeping the core ODM model focused on the most fundamental use cases, while using extensions to handle edge cases, has kept the standard easy for developers to learn and use. Copyright © 2016 Elsevier Inc. All rights reserved.
Metadata mapping and reuse in caBIG™
Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis
2009-01-01
Background This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories. Results The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository. Conclusion This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies. PMID:19208192
Reviving legacy clay mineralogy data and metadata through the IEDA-CCNY Data Internship Program
NASA Astrophysics Data System (ADS)
Palumbo, R. V.; Randel, C.; Ismail, A.; Block, K. A.; Cai, Y.; Carter, M.; Hemming, S. R.; Lehnert, K.
2016-12-01
Reconstruction of past climate and ocean circulation using ocean sediment cores relies on the use of multiple climate proxies measured on well-studied cores. Preserving all the information collected on a sediment core is crucial for the success of future studies using these unique and important samples. Clay mineralogy is a powerful tool to study weathering processes and sedimentary provenance. In his pioneering dissertation, Pierre Biscaye (1964, Yale University) established the X-Ray Diffraction (XRD) method for quantitative clay mineralogy analyses in ocean sediments and presented data for 500 core-top samples throughout the Atlantic Ocean and its neighboring seas. Unfortunately, the data only exists in analog format, which has discouraged scientists from reusing the data, apart from replication of the published maps. Archiving and preserving this dataset and making it publicly available in a digital format, linked with the metadata from the core repository will allow the scientific community to use these data to generate new findings. Under the supervision of Sidney Hemming and members of the Interdisciplinary Earth Data Alliance (IEDA) team, IEDA-CCNY interns digitized the data and metadata from Biscaye's dissertation and linked them with additional sample metadata using IGSN (International Geo-Sample Number). After compilation and proper documentation of the dataset, it was published in the EarthChem Library where the dataset will be openly accessible, and citable with a persistent DOI (Digital Object Identifier). During this internship, the students read peer-reviewed articles, interacted with active scientists in the field and acquired knowledge about XRD methods and the data generated, as well as its applications. They also learned about existing and emerging best practices in data publication and preservation. Data rescue projects are a fun and interactive way for students to become engaged in the field.
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data
Larralde, Martin; Lawson, Thomas N.; Weber, Ralf J. M.; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R.; Steinbeck, Christoph; Salek, Reza M.
2017-01-01
Abstract Summary Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. Availability and Implementation mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. Contact reza.salek@ebi.ac.uk or isatools@googlegroups.com Supplementary information Supplementary data are available at Bioinformatics online. PMID:28402395
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data.
Larralde, Martin; Lawson, Thomas N; Weber, Ralf J M; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R; Steinbeck, Christoph; Salek, Reza M
2017-08-15
Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. reza.salek@ebi.ac.uk or isatools@googlegroups.com. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Applying Content Management to Automated Provenance Capture
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schuchardt, Karen L.; Gibson, Tara D.; Stephan, Eric G.
2008-04-10
Workflows and data pipelines are becoming increasingly valuable in both computational and experimen-tal sciences. These automated systems are capable of generating significantly more data within the same amount of time than their manual counterparts. Automatically capturing and recording data prove-nance and annotation as part of these workflows is critical for data management, verification, and dis-semination. Our goal in addressing the provenance challenge was to develop and end-to-end system that demonstrates real-time capture, persistent content management, and ad-hoc searches of both provenance and metadata using open source software and standard protocols. We describe our prototype, which extends the Kepler workflow toolsmore » for the execution environment, the Scientific Annotation Middleware (SAM) content management software for data services, and an existing HTTP-based query protocol. Our implementation offers several unique capabilities, and through the use of standards, is able to pro-vide access to the provenance record to a variety of commonly available client tools.« less
Combining the CIDOC CRM and MPEG-7 to Describe Multimedia in Museums.
ERIC Educational Resources Information Center
Hunter, Jane
This paper describes a proposal for an interoperable metadata model, based on international standards, that has been designed to enable the description, exchange and sharing of multimedia resources both within and between cultural institutions. Domain-specific ontologies have been developed by two different ISO Working Groups to standardize the…
The Lom Approach--a Call for Concern?
ERIC Educational Resources Information Center
Armitage, Nicholas; Bowerman, Chris
2005-01-01
The LOM (Learning Object Model) approach to courseware design seems to be driven by a desire to increase access to education as well as use technology to enable a higher staff-student ratio than is currently possible. The LOM standard involves the use of standard metadata descriptions of content and adaptive content engines to deliver the…
The LOM Approach -- A CALL for Concern?
ERIC Educational Resources Information Center
Armitage, Nicholas; Bowerman, Chris
2005-01-01
The LOM (Learning Object Model) approach to courseware design seems to be driven by a desire to increase access to education as well as use technology to enable a higher staff-student ratio than is currently possible. The LOM standard involves the use of standard metadata descriptions of content and adaptive content engines to deliver the…
Sustainable Software Decisions for Long-term Projects (Invited)
NASA Astrophysics Data System (ADS)
Shepherd, A.; Groman, R. C.; Chandler, C. L.; Gaylord, D.; Sun, M.
2013-12-01
Adopting new, emerging technologies can be difficult for established projects that are positioned to exist for years to come. In some cases the challenge lies in the pre-existing software architecture. In others, the challenge lies in the fluctuation of resources like people, time and funding. The Biological and Chemical Oceanography Data Management Office (BCO-DMO) was created in late 2006 by combining the data management offices for the U.S. GLOBEC and U.S. JGOFS programs to publish data for researchers funded by the National Science Foundation (NSF). Since its inception, BCO-DMO has been supporting access and discovery of these data through web-accessible software systems, and the office has worked through many of the challenges of incorporating new technologies into its software systems. From migrating human readable, flat file metadata storage into a relational database, and now, into a content management system (Drupal) to incorporating controlled vocabularies, new technologies can radically affect the existing software architecture. However, through the use of science-driven use cases, effective resource management, and loosely coupled software components, BCO-DMO has been able to adapt its existing software architecture to adopt new technologies. One of the latest efforts at BCO-DMO revolves around applying metadata semantics for publishing linked data in support of data discovery. This effort primarily affects the metadata web interface software at http://bco-dmo.org and the geospatial interface software at http://mapservice.bco-dmo.org/. With guidance from science-driven use cases and consideration of our resources, implementation decisions are made using a strategy to loosely couple the existing software systems to the new technologies. The results of this process led to the use of REST web services and a combination of contributed and custom Drupal modules for publishing BCO-DMO's content using the Resource Description Framework (RDF) via an instance of the Virtuoso Open-Source triplestore.
A Domain Description Language for Data Processing
NASA Technical Reports Server (NTRS)
Golden, Keith
2003-01-01
We discuss an application of planning to data processing, a planning problem which poses unique challenges for domain description languages. We discuss these challenges and why the current PDDL standard does not meet them. We discuss DPADL (Data Processing Action Description Language), a language for describing planning domains that involve data processing. DPADL is a declarative, object-oriented language that supports constraints and embedded Java code, object creation and copying, explicit inputs and outputs for actions, and metadata descriptions of existing and desired data. DPADL is supported by the IMAGEbot system, which we are using to provide automation for an ecological forecasting application. We compare DPADL to PDDL and discuss changes that could be made to PDDL to make it more suitable for representing planning domains that involve data processing actions.
Innovating Data Discovery In NOAA OneStop By Integrating With Social Media
NASA Astrophysics Data System (ADS)
Jakositz, A.; McQuinn, E.; Delk, Z.; Shapiro, J.; Partee, R.; Richerson, E.
2017-12-01
Tasked with improving discovery of and access to NOAA data, the OneStop project has to consider a broad array of data types and end-users in the overall design. While work on the OneStop web interface and backend API is of utmost importance for enabling a variety of users to explore available NOAA data, the challenge of bringing those users to the OneStop portal in the first place remains. In this presentation, we highlight the benefits of using social media - namely YouTube - to attract users to both the data and tools existing in the NOAA realm. Furthermore, we discuss the ways in which varying data types can be discovered from the same portal, triggering different views (for instance, a streaming video), based on maintaining consistent metadata standards.
Advancements in Large-Scale Data/Metadata Management for Scientific Data.
NASA Astrophysics Data System (ADS)
Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.
2017-12-01
Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight the future enhancements of these tools which enable users to retrieve fast search results, along with parallelizing the retrieval process from online and High Performance Storage Systems. In addition, these improvements to the tools will support additional metadata formats like the Large-Eddy Simulation (LES) ARM Symbiotic and Observation (LASSO) bundle data.
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.
Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel
2017-09-18
The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0.63). Our algorithm identified keys that were similar to each other and grouped them together. Our intuition that underpins cleaning by clustering is that, dividing keys into different clusters resolves the scalability issues for data observation and cleaning, and keys in the same cluster with duplicates and errors can easily be found. Our algorithm can also be applied to other biomedical data types.
Applications of the LBA-ECO Metadata Warehouse
NASA Astrophysics Data System (ADS)
Wilcox, L.; Morrell, A.; Griffith, P. C.
2006-05-01
The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. This presentation will discuss reports currently in use, provide a technical overview of the system, and discuss plans to extend this system to harvest metadata resulting from the North American Carbon Program by drawing on datasets in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators.
Next-Generation Search Engines for Information Retrieval
DOE Office of Scientific and Technical Information (OSTI.GOV)
Devarakonda, Ranjeet; Hook, Leslie A; Palanisamy, Giri
In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is rich, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. Data should be stored in a format that can be retrievable and more importantly it should be in a format that will continue to be accessible as technology changes, such as XML. While general-purpose search engines (such as Google or Bing) are useful formore » finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences. Mercury is open-source toolset, backend built on Java and search capability is supported by the some popular open source search libraries such as SOLR and LUCENE. Mercury harvests the structured metadata and key data from several data providing servers around the world and builds a centralized index. The harvested files are indexed against SOLR search API consistently, so that it can render search capabilities such as simple, fielded, spatial and temporal searches across a span of projects ranging from land, atmosphere, and ocean ecology. Mercury also provides data sharing capabilities using Open Archive Initiatives Protocol for Metadata Handling (OAI-PMH). In this paper we will discuss about the best practices for archiving data and metadata, new searching techniques, efficient ways of data retrieval and information display.« less
Interoperability Across the Stewardship Spectrum in the DataONE Repository Federation
NASA Astrophysics Data System (ADS)
Jones, M. B.; Vieglais, D.; Wilson, B. E.
2016-12-01
Thousands of earth and environmental science repositories serve many researchers and communities, each with their own community and legal mandates, sustainability models, and historical infrastructure. These repositories span the stewardship spectrum from highly curated collections that employ large numbers of staff members to review and improve data, to small, minimal budget repositories that accept data caveat emptor and where all responsibility for quality lies with the submitter. Each repository fills a niche, providing services that meet the stewardship tradeoffs of one or more communities. We have reviewed these stewardship tradeoffs for several DataONE member repositories ranging from minimally (KNB) to highly curated (Arctic Data Center), as well as general purpose (Dryad) to highly discipline or project specific (NEON). The rationale behind different levels of stewardship reflect resolution of these tradeoffs. Some repositories aim to encourage extensive uptake by keeping processes simple and minimizing the amount of information collected, but this limits the long-term utility of the data and the search, discovery, and integration systems that are possible. Other repositories require extensive metadata input, review, and assessment, allowing for excellent preservation, discovery, and integration but at the cost of significant time for submitters and expense for curatorial staff. DataONE recognizes these different levels of curation, and attempts to embrace them to create a federation that is useful across the stewardship spectrum. DataONE provides a tiered model for repositories with growing utility of DataONE services at higher tiers of curation. The lowest tier supports read-only access to data and requires little more than title and contact metadata. Repositories can gradually phase in support for higher levels of metadata and services as needed. These tiered capabilities are possible through flexible support for multiple metadata standards and services, where repositories can incrementally increase their requirements as they want to satisfy more use cases. Within DataONE, metadata search services support minimal metadata models, but significantly expanded precision and recall become possible when repositories provide more extensively curated metadata.
MyOcean Internal Information System (Dial-P)
NASA Astrophysics Data System (ADS)
Blanc, Frederique; Jolibois, Tony; Loubrieu, Thomas; Manzella, Giuseppe; Mazzetti, Paolo; Nativi, Stefano
2010-05-01
MyOcean is a three-year project (2008-2011) which goal is the development and pre-operational validation of the GMES Marine Core Service for ocean monitoring and forecasting. It's a transition project that will conduct the European "operational oceanography" community towards the operational phase of a GMES European service, which demands more European integration, more operationality, and more service. Observations, model-based data, and added-value products will be generated - and enhanced thanks to dedicated expertise - by the following production units: • Five Thematic Assembly Centers, each of them dealing with a specific set of observation data: Sea Level, Ocean colour, Sea Surface Temperature, Sea Ice & Wind, and In Situ data, • Seven Monitoring and Forecasting Centers to serve the Global Ocean, the Arctic area, the Baltic Sea, the Atlantic North-West shelves area, the Atlantic Iberian-Biscay-Ireland area, the Mediterranean Sea and the Black sea. Intermediate and final users will discover, view and get the products by means of a central web desk, a central re-active manned service desk and thematic experts distributed across Europe. The MyOcean Information System (MIS) is considering the various aspects of an interoperable - federated information system. Data models support data and computer systems by providing the definition and format of data. The possibility of including the information in the data file is depending on data model adopted. In general there is little effort in the actual project to develop a ‘generic' data model. A strong push to develop a common model is provided by the EU Directive INSPIRE. At present, there is no single de-facto data format for storing observational data. Data formats are still evolving, with their underlying data models moving towards the concept of Feature Types based on ISO/TC211 standards. For example, Unidata are developing the Common Data Model that can represent scientific data types such as point, trajectory, station, grid, etc., which will be implemented in netCDF format. SeaDataNet is recommending ODV and NetCDF formats. Another problem related to data curation and interoperability is the possibility to use common vocabularies. Common vocabularies are developed in many international initiatives, such as GEMET (promoted by INSPIRE as a multilingual thesaurus), UNIDATA, SeaDataNet, Marine Metadata Initiative (MMI). MIS is considering the SeaDataNet vocabulary as a base for interoperability. Four layers of different abstraction levels of interoperability an be defined: - Technical/basic: this layer is implemented at each TAC or MFC through internet connection and basic services for data transfer and browsing (e.g FTP, HTTP, etc). - Syntactic: allowing the interchange of metadata and protocol elements. This layer corresponds to a definition Core Metadata Set, the format of exchange/delivery for the data and associated metadata and possible software. This layer is implemented by the DIAL-P logical interface (e.g. adoption of INSPIRE compliant metadata set and common data formats). - Functional/pragmatic: based on a common set of functional primitives or on a common set of service definitions. This layer refers to the definition of services based on Web services standards. This layer is implemented by the DIAL-P logical interface (e.g. adoption of INSPIRE compliant network services). - Semantic: allowing to access similar classes of objects and services across multiple sites, with multilinguality of content as one specific aspect. This layer corresponds to MIS interface, terminology and thesaurus. Given the above requirements, the proposed solution is a federation of systems, where the individual participants are self-contained autonomous systems, but together form a consistent wider picture. A mid-tier integration layer mediates between existing systems, adapting their data and service model schema to the MIS. The developed MIS is a read-only system, i.e. does not allow updating (or inserting) data into the participant resource systems. The main advantages of the proposed approach are: • to enable information sources to join the MIS and publish their data and metadata in a secure way, without any modification to their existing resources and procedures and without any restriction to their autonomy; • to enable users to browse and query the MIS, receiving an aggregated result incorporating relevant data and metadata from across different sources; • to accommodate the growth of such a MIS, either in terms of its clients or of its information resources, as well as the evolution of the underlying data model.
Stapleton, Jo Anne; Sonenshein, Roy
2004-01-01
Beginning in 1995 the U.S. Geological Survey (USGS) funded scientific research to support the restoration of the Greater Everglades area and to supply decision makers and resource mangers with sound data on which to base their actions. However, none of the research and resulting data is useful if it can?t be discovered, can?t be assessed for utility in an application, can?t be accessed, or is in an undetermined format. The decision was made early in the USGS Place-Based Studies (PBS) program to create a ?one-stop? entry for information and data about USGS research results. To facilitate the discovery process some mechanism was needed to allow standardized queries about data. The FGDC metadata standard has been used to document the South Florida PBS data from the beginning.
Pfaff, Claas-Thido; Eichenberg, David; Liebergesell, Mario; König-Ries, Birgitta; Wirth, Christian
2017-01-01
Ecology has become a data intensive science over the last decades which often relies on the reuse of data in cross-experimental analyses. However, finding data which qualifies for the reuse in a specific context can be challenging. It requires good quality metadata and annotations as well as efficient search strategies. To date, full text search (often on the metadata only) is the most widely used search strategy although it is known to be inaccurate. Faceted navigation is providing a filter mechanism which is based on fine granular metadata, categorizing search objects along numeric and categorical parameters relevant for their discovery. Selecting from these parameters during a full text search creates a system of filters which allows to refine and improve the results towards more relevance. We developed a framework for the efficient annotation and faceted navigation in ecology. It consists of an XML schema for storing the annotation of search objects and is accompanied by a vocabulary focused on ecology to support the annotation process. The framework consolidates ideas which originate from widely accepted metadata standards, textbooks, scientific literature, and vocabularies as well as from expert knowledge contributed by researchers from ecology and adjacent disciplines.
EPA Office of Water (OW): 12-digit Hydrologic Unit Boundaries of the United States
The Watershed Boundary Dataset (WBD) is a complete digital hydrologic unit national boundary layer that is at the Subwatershed (12-digit) level. It is composed of the watershed boundaries delineated by state agencies at the 1:24,000 scale. Please refer to the individual state metadata as the primary reference source. To access state specific metadata, go to the following link to view documentation created by agencies that performed the watershed delineation. This data set is a complete digital hydrologic unit boundary layer to the Subwatershed (12-digit) 6th level. This data set consists of geo-referenced digital data and associated attributes created in accordance with the FGDC Proposal, Version 1.0 - Federal Standards For Delineation of Hydrologic Unit Boundaries 3/01/02. Polygons are attributed with hydrologic unit codes for 4th level sub-basins, 5th level watersheds, 6th level subwatersheds, name, size, downstream hydrologic unit, type of watershed, non-contributing areas and flow modification. Arcs are attributed with the highest hydrologic unit code for each watershed, linesource and a metadata reference file.Please refer to the Metadata contact if you want access to the WBD national data set.
A metadata approach for clinical data management in translational genomics studies in breast cancer.
Papatheodorou, Irene; Crichton, Charles; Morris, Lorna; Maccallum, Peter; Davies, Jim; Brenton, James D; Caldas, Carlos
2009-11-30
In molecular profiling studies of cancer patients, experimental and clinical data are combined in order to understand the clinical heterogeneity of the disease: clinical information for each subject needs to be linked to tumour samples, macromolecules extracted, and experimental results. This may involve the integration of clinical data sets from several different sources: these data sets may employ different data definitions and some may be incomplete. In this work we employ semantic web techniques developed within the CancerGrid project, in particular the use of metadata elements and logic-based inference to annotate heterogeneous clinical information, integrate and query it. We show how this integration can be achieved automatically, following the declaration of appropriate metadata elements for each clinical data set; we demonstrate the practicality of this approach through application to experimental results and clinical data from five hospitals in the UK and Canada, undertaken as part of the METABRIC project (Molecular Taxonomy of Breast Cancer International Consortium). We describe a metadata approach for managing similarities and differences in clinical datasets in a standardized way that uses Common Data Elements (CDEs). We apply and evaluate the approach by integrating the five different clinical datasets of METABRIC.
ESGF and WDCC: The Double Structure of the Digital Data Storage at DKRZ
NASA Astrophysics Data System (ADS)
Toussaint, F.; Höck, H.
2016-12-01
Since a couple of years, Digital Repositories of climate science face new challenges: International projects are global collaborations. The data storage in parallel moved to federated, distributed storage systems like ESGF. For the long term archival storage (LTA) on the other hand, communities, funders, and data users make stronger demands for data and metadata quality to facilitate data use and reuse. At DKRZ, this situation led to a twofold data dissemination system - a situation which has influence on administration, workflows, and sustainability of the data. The ESGF system is focused on the needs of users as partners in global projects. It includes replication tools, detailed global project standards, and efficient search for the data to download. In contrast, DKRZ's classical CERA LTA storage aims for long term data holding and data curation as well as for data reuse requiring high metadata quality standards. In addition, for LTA data a Digital Object Identifier publication service for the direct integration of research data in scientific publications has been implemented. The editorial process at DKRZ-LTA ensures the quality of metadata and research data. The DOI and a citation code are provided and afterwards registered under DataCite's (datacite.org) regulations. In the overall data life cycle continuous reliability of the data and metadata quality is essential to allow for data handling at Petabytes level, data long term usability, and adequate publication of the results. These considerations lead to the question "What is quality" - with respect to data, to the repository itself, to the publisher, and the user? Global consensus is needed for these assessments as the phases of the end to end workflow gear into each other: For data and metadata, checks need to go hand in hand with the processes of production and storage. The results can be judged following a Quality Maturity Matrix (QMM). Repositories can be certified according to their trustworthiness. For the publication of any scientific conclusions, scientific community, funders, media, and policy makers ask for the publisher's impact in terms of readers' credit, run, and presentation quality. The paper describes the data life cycle. Emphasis is put on the different levels of quality assessment which at DKRZ ensure the data and metadata quality.
Modernized Techniques for Dealing with Quality Data and Derived Products
NASA Astrophysics Data System (ADS)
Neiswender, C.; Miller, S. P.; Clark, D.
2008-12-01
"I just want a picture of the ocean floor in this area" is expressed all too often by researchers, educators, and students in the marine geosciences. As more sophisticated systems are developed to handle data collection and processing, the demand for quality data, and standardized products continues to grow. Data management is an invisible bridge between science and researchers/educators. The SIOExplorer digital library presents more than 50 years of ocean-going research. Prior to publication, all data is checked for quality using standardized criterion developed for each data stream. Despite the evolution of data formats and processing systems, SIOExplorer continues to present derived products in well- established formats. Standardized products are published for each cruise, and include a cruise report, MGD77 merged data, multi-beam flipbook, and underway profiles. Creation of these products is made possible by processing scripts, which continue to change with ever-evolving data formats. We continue to explore the potential of database-enabled creation of standardized products, such as the metadata-rich MGD77 header file. Database-enabled, automated processing produces standards-compliant metadata for each data and derived product. Metadata facilitates discovery and interpretation of published products. This descriptive information is stored both in an ASCII file, and a searchable digital library database. SIOExplorer's underlying technology allows focused search and retrieval of data and products. For example, users can initiate a search of only multi-beam data, which includes data-specific parameters. This customization is made possible with a synthesis of database, XML, and PHP technology. The combination of standardized products and digital library technology puts quality data and derived products in the hands of scientists. Interoperable systems enable distribution these published resources using technology such as web services. By developing modernized strategies to deal with data, Scripps Institution of Oceanography is able to produce and distribute well-formed, and quality-tested derived products, which aid research, understanding, and education.
Database integration in a multimedia-modeling environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dorow, Kevin E.
2002-09-02
Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less
Data System Architectures: Recent Experiences from Data Intensive Projects
NASA Astrophysics Data System (ADS)
Palanisamy, G.; Frame, M. T.; Boden, T.; Devarakonda, R.; Zolly, L.; Hutchison, V.; Latysh, N.; Krassovski, M.; Killeffer, T.; Hook, L.
2014-12-01
U.S. Federal agencies are frequently trying to address new data intensive projects that require next generation of data system architectures. This presentation will focus on two new such architectures: USGS's Science Data Catalog (SDC) and DOE's Next Generation Ecological Experiments - Arctic Data System. The U.S. Geological Survey (USGS) developed a Science Data Catalog (data.usgs.gov) to include records describing datasets, data collections, and observational or remotely-sensed data. The system was built using service oriented architecture and allows USGS scientists and data providers to create and register their data using either a standards-based metadata creation form or simply to register their already-created metadata records with the USGS SDC Dashboard. This dashboard then compiles the harvested metadata records and sends them to the post processing and indexing service using the JSON format. The post processing service, with the help of various ontologies and other geo-spatial validation services, auto-enhances these harvested metadata records and creates a Lucene index using the Solr enterprise search platform. Ultimately, metadata is made available via the SDC search interface. DOE's Next Generation Ecological Experiments (NGEE) Arctic project deployed a data system that allows scientists to prepare, publish, archive, and distribute data from field collections, lab experiments, sensors, and simulated modal outputs. This architecture includes a metadata registration form, data uploading and sharing tool, a Digital Object Identifier (DOI) tool, a Drupal based content management tool (http://ngee-arctic.ornl.gov), and a data search and access tool based on ORNL's Mercury software (http://mercury.ornl.gov). The team also developed Web-metric tools and a data ingest service to visualize geo-spatial and temporal observations.
Semantic technologies improving the recall and precision of the Mercury metadata search engine
NASA Astrophysics Data System (ADS)
Pouchard, L. C.; Cook, R. B.; Green, J.; Palanisamy, G.; Noy, N.
2011-12-01
The Mercury federated metadata system [1] was developed at the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC), a NASA-sponsored effort holding datasets about biogeochemical dynamics, ecological data, and environmental processes. Mercury currently indexes over 100,000 records from several data providers conforming to community standards, e.g. EML, FGDC, FGDC Biological Profile, ISO 19115 and DIF. With the breadth of sciences represented in Mercury, the potential exists to address some key interdisciplinary scientific challenges related to climate change, its environmental and ecological impacts, and mitigation of these impacts. However, this wealth of metadata also hinders pinpointing datasets relevant to a particular inquiry. We implemented a semantic solution after concluding that traditional search approaches cannot improve the accuracy of the search results in this domain because: a) unlike everyday queries, scientific queries seek to return specific datasets with numerous parameters that may or may not be exposed to search (Deep Web queries); b) the relevance of a dataset cannot be judged by its popularity, as each scientific inquiry tends to be unique; and c)each domain science has its own terminology, more or less curated, consensual, and standardized depending on the domain. The same terms may refer to different concepts across domains (homonyms), but different terms mean the same thing (synonyms). Interdisciplinary research is arduous because an expert in a domain must become fluent in the language of another, just to find relevant datasets. Thus, we decided to use scientific ontologies because they can provide a context for a free-text search, in a way that string-based keywords never will. With added context, relevant datasets are more easily discoverable. To enable search and programmatic access to ontology entities in Mercury, we are using an instance of the BioPortal ontology repository. Mercury accesses ontology entities using the BioPortal REST API by passing a search parameter to BioPortal that may return domain context, parameter attribute, or entity annotations depending on the entity's associated ontological relationships. As Mercury's facetted search is popular with users, the results are displayed as facets. Unlike a facetted search however, the ontology-based solution implements both restrictions (improving precision) and expansions (improving recall) on the results of the initial search. For instance, "carbon" acquires a scientific context and additional key terms or phrases for discovering domain-specific datasets. A limitation of our solution is that the user must perform an additional step. Another limitation is that the quality of the newly discovered metadata is contingent upon the quality of the ontologies we use. Our solution leverages Mercury's federated capabilities to collect records from heterogeneous domains, and BioPortal's storage, curation and access capabilities for ontology entities. With minimal additional development, our approach builds on two mature systems for finding relevant datasets for interdisciplinary inquiries. We thus indicate a path forward for linking environmental, ecological and biological sciences. References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94.
Distributed Multi-interface Catalogue for Geospatial Data
NASA Astrophysics Data System (ADS)
Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.
2007-12-01
Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and inventory systems. Prominent features include: 1)Support to distributed queries over a hierarchical data model, supporting incremental queries (i.e. query over collections, to be subsequently refined) and opaque/translucent chaining; 2)Support to several client protocols, through a compound front-end interface module. This allows to accommodate a (growing) number of cataloguing standards, or profiles thereof, including the OGC CSW interface, ebRIM Application Profile (for Core ISO Metadata and other data models), and the ISO Application Profile. The presented catalog clearinghouse supports both the opaque and translucent pattern for service chaining. In fact, the clearinghouse catalog may be configured either to completely hide the underlying federated services or to provide clients with services information. In both cases, the clearinghouse solution presents a higher level interface (i.e. OGC CSW) which harmonizes multiple lower level services (e.g. OGC CSW, WMS and WCS, THREDDS, etc.), and handles all control and interaction with them. In the translucent case, client has the option to directly access the lower level services (e.g. to improve performances). In the GEOSS context, the solution has been experimented both as a stand-alone user application and as a service framework. The first scenario allows a user to download a multi-platform client software and query a federation of cataloguing systems, that he can customize at will. The second scenario support server-side deployment and can be flexibly adapted to several use-cases, such as intranet proxy, catalog broker, etc.
Seeing and Reading Red: Hue and Color-word Correlation in Images and Attendant Text on the WWW
DOE Office of Scientific and Technical Information (OSTI.GOV)
Newsam, S
2004-07-12
This work represents an initial investigation into determining whether correlations actually exist between metadata and content descriptors in multimedia datasets. We provide a quantitative method for evaluating whether the hue of images on the WWW is correlated with the occurrence of color-words in metadata such as URLs, image names, and attendant text. It turns out that such a correlation does exist: the likelihood that a particular color appears in an image whose URL, name, and/or attendant text contains the corresponding color-word is generally at least twice the likelihood that the color appears in a randomly chosen image on the WWW.more » While this finding might not be significant in and of itself, it represents an initial step towards quantitatively establishing that other, perhaps more useful correlations exist. These correlations form the basis for exciting novel approaches that leverage semi-supervised datasets, such as the WWW, to overcome the semantic gap that has hampered progress in multimedia information retrieval for some time now.« less
The STP (Solar-Terrestrial Physics) Semantic Web based on the RSS1.0 and the RDF
NASA Astrophysics Data System (ADS)
Kubo, T.; Murata, K. T.; Kimura, E.; Ishikura, S.; Shinohara, I.; Kasaba, Y.; Watari, S.; Matsuoka, D.
2006-12-01
In the Solar-Terrestrial Physics (STP), it is pointed out that circulation and utilization of observation data among researchers are insufficient. To archive interdisciplinary researches, we need to overcome this circulation and utilization problems. Under such a background, authors' group has developed a world-wide database that manages meta-data of satellite and ground-based observation data files. It is noted that retrieving meta-data from the observation data and registering them to database have been carried out by hand so far. Our goal is to establish the STP Semantic Web. The Semantic Web provides a common framework that allows a variety of data shared and reused across applications, enterprises, and communities. We also expect that the secondary information related with observations, such as event information and associated news, are also shared over the networks. The most fundamental issue on the establishment is who generates, manages and provides meta-data in the Semantic Web. We developed an automatic meta-data collection system for the observation data using the RSS (RDF Site Summary) 1.0. The RSS1.0 is one of the XML-based markup languages based on the RDF (Resource Description Framework), which is designed for syndicating news and contents of news-like sites. The RSS1.0 is used to describe the STP meta-data, such as data file name, file server address and observation date. To describe the meta-data of the STP beyond RSS1.0 vocabulary, we defined original vocabularies for the STP resources using the RDF Schema. The RDF describes technical terms on the STP along with the Dublin Core Metadata Element Set, which is standard for cross-domain information resource descriptions. Researchers' information on the STP by FOAF, which is known as an RDF/XML vocabulary, creates a machine-readable metadata describing people. Using the RSS1.0 as a meta-data distribution method, the workflow from retrieving meta-data to registering them into the database is automated. This technique is applied for several database systems, such as the DARTS database system and NICT Space Weather Report Service. The DARTS is a science database managed by ISAS/JAXA in Japan. We succeeded in generating and collecting the meta-data automatically for the CDF (Common data Format) data, such as Reimei satellite data, provided by the DARTS. We also create an RDF service for space weather report and real-time global MHD simulation 3D data provided by the NICT. Our Semantic Web system works as follows: The RSS1.0 documents generated on the data sites (ISAS and NICT) are automatically collected by a meta-data collection agent. The RDF documents are registered and the agent extracts meta-data to store them in the Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying. The RDF database provides advanced retrieval processing that has considered property and relation. Finally, the STP Semantic Web provides automatic processing or high level search for the data which are not only for observation data but for space weather news, physical events, technical terms and researches information related to the STP.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.
A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata requestmore » can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.« less
Seven recommendations to make your invasive alien species data more useful
Groom, Quentin J.; Adriaens, Tim; Desmet, Peter; Simpson, Annie; De Wever, Aaike; Bazos, Ioannis; Cardoso, Ana Cristina; Charles, Lucinda; Christopoulou, Anastasia; Gazda, Anna; Helmisaari, Harry; Hobern, Donald; Josefsson, Melanie; Lucy, Frances; Marisavljevic, Dragana; Oszako, Tomasz; Pergl, Jan; Petrovic-Obradovic, Olivera; Prévot, Céline; Ravn, Hans Peter; Richards, Gareth; Roques, Alain; Roy, Helen; Rozenberg, Marie-Anne A.; Scalera, Riccardo; Tricarico, Elena; Trichkova, Teodora; Vercayie, Diemer; Zenetos, Argyro; Vanderhoeven, Sonia
2017-01-01
Science-based strategies to tackle biological invasions depend on recent, accurate, well-documented, standardized and openly accessible information on alien species. Currently and historically, biodiversity data are scattered in numerous disconnected data silos that lack interoperability. The situation is no different for alien species data, and this obstructs efficient retrieval, combination, and use of these kinds of information for research and policy-making. Standardization and interoperability are particularly important as many alien species related research and policy activities require pooling data. We describe seven ways that data on alien species can be made more accessible and useful, based on the results of a European Cooperation in Science and Technology (COST) workshop: (1) Create data management plans; (2) Increase interoperability of information sources; (3) Document data through metadata; (4) Format data using existing standards; (5) Adopt controlled vocabularies; (6) Increase data availability; and (7) Ensure long-term data preservation. We identify four properties specific and integral to alien species data (species status, introduction pathway, degree of establishment, and impact mechanism) that are either missing from existing data standards or lack a recommended controlled vocabulary. Improved access to accurate, real-time and historical data will repay the long-term investment in data management infrastructure, by providing more accurate, timely and realistic assessments and analyses. If we improve core biodiversity data standards by developing their relevance to alien species, it will allow the automation of common activities regarding data processing in support of environmental policy. Furthermore, we call for considerable effort to maintain, update, standardize, archive, and aggregate datasets, to ensure proper valorization of alien species data and information before they become obsolete or lost.
Hill, Jon; Davis, Katie E
2014-01-01
Building large supertrees involves the collection, storage, and processing of thousands of individual phylogenies to create large phylogenies with thousands to tens of thousands of taxa. Such large phylogenies are useful for macroevolutionary studies, comparative biology and in conservation and biodiversity. No easy to use and fully integrated software package currently exists to carry out this task. Here, we present a new Python-based software package that uses well defined XML schema to manage both data and metadata. It builds on previous versions by 1) including new processing steps, such as Safe Taxonomic Reduction, 2) using a user-friendly GUI that guides the user to complete at least the minimum information required and includes context-sensitive documentation, and 3) a revised storage format that integrates both tree- and meta-data into a single file. These data can then be manipulated according to a well-defined, but flexible, processing pipeline using either the GUI or a command-line based tool. Processing steps include standardising names, deleting or replacing taxa, ensuring adequate taxonomic overlap, ensuring data independence, and safe taxonomic reduction. This software has been successfully used to store and process data consisting of over 1000 trees ready for analyses using standard supertree methods. This software makes large supertree creation a much easier task and provides far greater flexibility for further work.
HELIOGate, a Portal for the Heliophysics Community
NASA Astrophysics Data System (ADS)
Pierantoni; Gabriele; Carley, Eoin
2014-10-01
Heliophysics is the branch of physics that investigates the interactions between the Sun and the other bodies of the solar system. Heliophysicists rely on data collected from numerous sources scattered across the Solar System. The data collected from these sources is processed to extract metadata and the metadata extracted in this fashion is then used to build indexes of features and events called catalogues. Heliophysicists also develop conceptual and mathematical models of the phenomena and the environment of the Solar System. More specifically, they investigate the physical characteristics of the phenomena and they simulate how they propagate throughout the Solar System with mathematical and physical abstractions called propagation models. HELIOGate aims at addressing the need to combine and orchestrate existing web services in a flexible and easily configurable fashion to tackle different scientific questions. HELIOGate also offers a tool capable of connecting to size! able computation and storage infrastructures to execute data processing codes that are needed to calibrate raw data and to extract metadata.
Vempati, Uma D; Chung, Caty; Mader, Chris; Koleti, Amar; Datar, Nakul; Vidović, Dušica; Wrobel, David; Erickson, Sean; Muhlich, Jeremy L; Berriz, Gabriel; Benes, Cyril H; Subramanian, Aravind; Pillai, Ajay; Shamu, Caroline E; Schürer, Stephan C
2014-06-01
The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide transcriptional, and phenotypic cellular response signatures to a variety of small-molecule and genetic perturbations with the goal of creating a sustainable, widely applicable, and readily accessible systems biology knowledge resource. Integration and analysis of diverse LINCS data sets depend on the availability of sufficient metadata to describe the assays and screening results and on their syntactic, structural, and semantic consistency. Here we report metadata specifications for the most important molecular and cellular components and recommend them for adoption beyond the LINCS project. We focus on the minimum required information to model LINCS assays and results based on a number of use cases, and we recommend controlled terminologies and ontologies to annotate assays with syntactic consistency and semantic integrity. We also report specifications for a simple annotation format (SAF) to describe assays and screening results based on our metadata specifications with explicit controlled vocabularies. SAF specifically serves to programmatically access and exchange LINCS data as a prerequisite for a distributed information management infrastructure. We applied the metadata specifications to annotate large numbers of LINCS cell lines, proteins, and small molecules. The resources generated and presented here are freely available. © 2014 Society for Laboratory Automation and Screening.
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.
Bernstein, Matthew N; Doan, AnHai; Dewey, Colin N
2017-09-15
The NCBI's Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA. We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline. cdewey@biostat.wisc.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
The use of advanced web-based survey design in Delphi research.
Helms, Christopher; Gardner, Anne; McInnes, Elizabeth
2017-12-01
A discussion of the application of metadata, paradata and embedded data in web-based survey research, using two completed Delphi surveys as examples. Metadata, paradata and embedded data use in web-based Delphi surveys has not been described in the literature. The rapid evolution and widespread use of online survey methods imply that paper-based Delphi methods will likely become obsolete. Commercially available web-based survey tools offer a convenient and affordable means of conducting Delphi research. Researchers and ethics committees may be unaware of the benefits and risks of using metadata in web-based surveys. Discussion paper. Two web-based, three-round Delphi surveys were conducted sequentially between August 2014 - January 2015 and April - May 2016. Their aims were to validate the Australian nurse practitioner metaspecialties and their respective clinical practice standards. Our discussion paper is supported by researcher experience and data obtained from conducting both web-based Delphi surveys. Researchers and ethics committees should consider the benefits and risks of metadata use in web-based survey methods. Web-based Delphi research using paradata and embedded data may introduce efficiencies that improve individual participant survey experiences and reduce attrition across iterations. Use of embedded data allows the efficient conduct of multiple simultaneous Delphi surveys across a shorter timeframe than traditional survey methods. The use of metadata, paradata and embedded data appears to improve response rates, identify bias and give possible explanation for apparent outlier responses, providing an efficient method of conducting web-based Delphi surveys. © 2017 John Wiley & Sons Ltd.
Development of DKB ETL module in case of data conversion
NASA Astrophysics Data System (ADS)
Kaida, A. Y.; Golosova, M. V.; Grigorieva, M. A.; Gubin, M. Y.
2018-05-01
Modern scientific experiments involve the producing of huge volumes of data that requires new approaches in data processing and storage. These data themselves, as well as their processing and storage, are accompanied by a valuable amount of additional information, called metadata, distributed over multiple informational systems and repositories, and having a complicated, heterogeneous structure. Gathering these metadata for experiments in the field of high energy nuclear physics (HENP) is a complex issue, requiring the quest for solutions outside the box. One of the tasks is to integrate metadata from different repositories into some kind of a central storage. During the integration process, metadata taken from original source repositories go through several processing steps: metadata aggregation, transformation according to the current data model and loading it to the general storage in a standardized form. The R&D project of ATLAS experiment on LHC, Data Knowledge Base, is aimed to provide fast and easy access to significant information about LHC experiments for the scientific community. The data integration subsystem, being developed for the DKB project, can be represented as a number of particular pipelines, arranging data flow from data sources to the main DKB storage. The data transformation process, represented by a single pipeline, can be considered as a number of successive data transformation steps, where each step is implemented as an individual program module. This article outlines the specifics of program modules, used in the dataflow, and describes one of the modules developed and integrated into the data integration subsystem of DKB.
Java Library for Input and Output of Image Data and Metadata
NASA Technical Reports Server (NTRS)
Deen, Robert; Levoe, Steven
2003-01-01
A Java-language library supports input and output (I/O) of image data and metadata (label data) in the format of the Video Image Communication and Retrieval (VICAR) image-processing software and in several similar formats, including a subset of the Planetary Data System (PDS) image file format. The library does the following: It provides low-level, direct access layer, enabling an application subprogram to read and write specific image files, lines, or pixels, and manipulate metadata directly. Two coding/decoding subprograms ("codecs" for short) based on the Java Advanced Imaging (JAI) software provide access to VICAR and PDS images in a file-format-independent manner. The VICAR and PDS codecs enable any program that conforms to the specification of the JAI codec to use VICAR or PDS images automatically, without specific knowledge of the VICAR or PDS format. The library also includes Image I/O plugin subprograms for VICAR and PDS formats. Application programs that conform to the Image I/O specification of Java version 1.4 can utilize any image format for which such a plug-in subprogram exists, without specific knowledge of the format itself. Like the aforementioned codecs, the VICAR and PDS Image I/O plug-in subprograms support reading and writing of metadata.
Razick, Sabry; Močnik, Rok; Thomas, Laurent F.; Ryeng, Einar; Drabløs, Finn; Sætrom, Pål
2014-01-01
Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data. Database URL: http://bigr.medisin.ntnu.no/data/eGenVar/ PMID:24682735
Data Citation Concept for CMIP6
NASA Astrophysics Data System (ADS)
Stockhause, M.; Toussaint, F.; Lautenschlager, M.; Lawrence, B.
2015-12-01
There is a broad consensus among data centers and scientific publishers on Force 11's 'Joint Declaration of Data Citation Principles'. To put these principles into operation is not always as straight forward. The focus for CMIP6 data citations lies on the citation of data created by others and used in an analysis underlying the article. And for this source data usually no article of the data creators is available ('stand-alone data publication'). The planned data citation granularities are model data (data collections containing all datasets provided for the project by a single model) and experiment data (data collections containing all datasets for a scientific experiment run by a single model). In case of large international projects or activities like CMIP, the data is commonly stored and disseminated by multiple repositories in a federated data infrastructure such as the Earth System Grid Federation (ESGF). The individual repositories are subject to different institutional and national policies. A Data Management Plan (DMP) will define a certain standard for the repositories including data handling procedures. Another aspect of CMIP data, relevant for data citations, is its dynamic nature. For such large data collections, datasets are added, revised and retracted for years, before the data collection becomes stable for a data citation entity including all model or simulation data. Thus, a critical issue for ESGF is data consistency, requiring thorough dataset versioning to enable the identification of the data collection in the cited version. Currently, the ESGF is designed for accessing the latest dataset versions. Data citation introduces the necessity to support older and retracted dataset versions by storing metadata even beyond data availability (data unpublished in ESGF). Apart from ESGF, other infrastructure components exist for CMIP, which provide information that has to be connected to the CMIP6 data, e.g. ES-DOC providing information on models and simulations and the IPCC Data Distribution Centre (DDC) storing a subset of data together with available metadata (ES-DOC) for the long-term reuse of the interdisciplinary community. Other connections exist to standard project vocabularies, to personal identifiers (e.g. ORCID), or to data products (including provenance information).
NASA Astrophysics Data System (ADS)
Schaap, D. M. A.; Maudire, G.
2009-04-01
SeaDataNet is an Integrated research Infrastructure Initiative (I3) in EU FP6 (2006 - 2011) to provide the data management system adapted both to the fragmented observation system and the users need for an integrated access to data, meta-data, products and services. Therefore SeaDataNet insures the long term archiving of the large number of multidisciplinary data (i.e. temperature, salinity current, sea level, chemical, physical and biological properties) collected by many different sensors installed on board of research vessels, satellite and the various platforms of the marine observing system. The SeaDataNet project started in 2006, but builds upon earlier data management infrastructure projects, undertaken over a period of 20 years by an expanding network of oceanographic data centres from the countries around all European seas. Its predecessor project Sea-Search had a strict focus on metadata. SeaDataNet maintains significant interest in the further development of the metadata infrastructure, but its primary objective is the provision of easy data access and generic data products. SeaDataNet is a distributed infrastructure that provides transnational access to marine data, meta-data, products and services through 40 interconnected Trans National Data Access Platforms (TAP) from 35 countries around the Black Sea, Mediterranean, North East Atlantic, North Sea, Baltic and Arctic regions. These include: National Oceanographic Data Centres (NODC's) Satellite Data Centres. Furthermore the SeaDataNet consortium comprises a number of expert modelling centres, SME's experts in IT, and 3 international bodies (ICES, IOC and JRC). Planning: The SeaDataNet project is delivering and operating the infrastructure in 3 versions: Version 0: maintenance and further development of the metadata systems developed by the Sea-Search project plus the development of a new metadata system for indexing and accessing to individual data objects managed by the SeaDataNet data centres. This is known as the Common Data Index (CDI) V0 system Version 1: harmonisation and upgrading of the metadatabases through adoption of the ISO 19115 metadata standard and provision of transparent data access and download services from all partner data centres through upgrading the Common Data Index and deployment of a data object delivery service. Version 2: adding data product services and OGC compliant viewing services and further virtualisation of data access. SeaDataNet Version 0: The SeaDataNet portal has been set up at http://www.seadatanet.org and it provides a platform for all SeaDataNet services and standards as well as background information about the project and its partners. It includes discovery services via the following catalogues: CSR - Cruise Summary Reports of research vessels; EDIOS - Locations and details of monitoring stations and networks / programmes; EDMED - High level inventory of Marine Environmental Data sets collected and managed by research institutes and organisations; EDMERP - Marine Environmental Research Projects ; EDMO - Marine Organisations. These catalogues are interrelated, where possible, to facilitate cross searching and context searching. These catalogues connect to the Common Data Index (CDI). Common Data Index (CDI) The CDI gives detailed insight in available datasets at partners databases and paves the way to direct online data access or direct online requests for data access / data delivery. The CDI V0 metadatabase contains more than 340.000 individual data entries from 36 CDI partners from 29 countries across Europe, covering a broad scope and range of data, held by these organisations. For purposes of standardisation and international exchange the ISO19115 metadata standard has been adopted. The CDI format is defined as a dedicated subset of this standard. A CDI XML format supports the exchange between CDI-partners and the central CDI manager, and ensures interoperability with other systems and networks. CDI XML entries are generated by participating data centres, directly from their databases. CDI-partners can make use of dedicated SeaDataNet Tools to generate CDI XML files automatically. Approach for SeaDataNet V1 and V2: The approach for SeaDataNet V1 and V2, which is in line with the INSPIRE Directive, comprises the following services: Discovery services = Metadata directories Security services = Authentication, Authorization & Accounting (AAA) Delivery services = Data access & downloading of datasets Viewing services = Visualisation of metadata, data and data products Product services = Generic and standard products Monitoring services = Statistics on usage and performance of the system Maintenance services = Updating of metadata by SeaDataNet partners The services will be operated over a distributed network of interconnected Data Centres accessed through a central Portal. In addition to service access the portal will provide information on data management standards, tools and protocols. The architecture has been designed to provide a coherent system based on V1 services, whilst leaving the pathway open for later extension with V2 services. For the implementation, a range of technical components have been defined. Some are already operational with the remainder in the final stages of development and testing. These make use of recent web technologies, and also comprise Java components, to provide multi-platform support and syntactic interoperability. To facilitate sharing of resources and interoperability, SeaDataNet has adopted SOAP Web Service technology. The SeaDataNet architecture and components have been designed to handle all kinds of oceanographic and marine environmental data including both in-situ measurements and remote sensing observations. The V1 technical development is ready and the V1 system is now being implemented and adopted by all participating data centres in SeaDataNet. Interoperability: Interoperability is the key to distributed data management system success and it is achieved in SeaDataNet V1 by: Using common quality control protocols and flag scale Using controlled vocabularies from a single source that have been developed using international content governance Adopting the ISO 19115 metadata standard for all metadata directories Providing XML Validation Services to quality control the metadata maintenance, including field content verification based on Schematron. Providing standard metadata entry tools Using harmonised Data Transport Formats (NetCDF, ODV ASCII and MedAtlas ASCII) for data sets delivery Adopting of OGC standards for mapping and viewing services Using SOAP Web Services in the SeaDataNet architecture SeaDataNet V1 Delivery Services: An important objective of the V1 system is to provide transparent access to the distributed data sets via a unique user interface at the SeaDataNet portal and download service. In the SeaDataNet V1 architecture the Common Data Index (CDI) V1 provides the link between discovery and delivery. The CDI user interface enables users to have a detailed insight of the availability and geographical distribution of marine data, archived at the connected data centres, and it provides the means for downloading data sets in common formats via a transaction mechanism. The SeaDataNet portal provides registered users access to these distributed data sets via the CDI V1 Directory and a shopping basket mechanism. This allows registered users to locate data of interest and submit their data requests. The requests are forwarded automatically from the portal to the relevant SeaDataNet data centres. This process is controlled via the Request Status Manager (RSM) Web Service at the portal and a Download Manager (DM) java software module, implemented at each of the data centres. The RSM also enables registered users to check regularly the status of their requests and download data sets, after access has been granted. Data centres can follow all transactions for their data sets online and can handle requests which require their consent. The actual delivery of data sets is done between the user and the selected data centre. The CDI V1 system is now being populated by all participating data centres in SeaDataNet, thereby phasing out CDI V0. 0.1 SeaDataNet Partners: IFREMER (France), MARIS (Netherlands), HCMR/HNODC (Greece), ULg (Belgium), OGS (Italy), NERC/BODC (UK), BSH/DOD (Germany), SMHI (Sweden), IEO (Spain), RIHMI/WDC (Russia), IOC (International), ENEA (Italy), INGV (Italy), METU (Turkey), CLS (France), AWI (Germany), IMR (Norway), NERI (Denmark), ICES (International), EC-DG JRC (International), MI (Ireland), IHPT (Portugal), RIKZ (Netherlands), RBINS/MUMM (Belgium), VLIZ (Belgium), MRI (Iceland), FIMR (Finland ), IMGW (Poland), MSI (Estonia), IAE/UL (Latvia), CMR (Lithuania), SIO/RAS (Russia), MHI/DMIST (Ukraine), IO/BAS (Bulgaria), NIMRD (Romania), TSU (Georgia), INRH (Morocco), IOF (Croatia), PUT (Albania), NIB (Slovenia), UoM (Malta), OC/UCY (Cyprus), IOLR (Israel), NCSR/NCMS (Lebanon), CNR-ISAC (Italy), ISMAL (Algeria), INSTM (Tunisia)
A general concept for consistent documentation of computational analyses
Müller, Fabian; Nordström, Karl; Lengauer, Thomas; Schulz, Marcel H.
2015-01-01
The ever-growing amount of data in the field of life sciences demands standardized ways of high-throughput computational analysis. This standardization requires a thorough documentation of each step in the computational analysis to enable researchers to understand and reproduce the results. However, due to the heterogeneity in software setups and the high rate of change during tool development, reproducibility is hard to achieve. One reason is that there is no common agreement in the research community on how to document computational studies. In many cases, simple flat files or other unstructured text documents are provided by researchers as documentation, which are often missing software dependencies, versions and sufficient documentation to understand the workflow and parameter settings. As a solution we suggest a simple and modest approach for documenting and verifying computational analysis pipelines. We propose a two-part scheme that defines a computational analysis using a Process and an Analysis metadata document, which jointly describe all necessary details to reproduce the results. In this design we separate the metadata specifying the process from the metadata describing an actual analysis run, thereby reducing the effort of manual documentation to an absolute minimum. Our approach is independent of a specific software environment, results in human readable XML documents that can easily be shared with other researchers and allows an automated validation to ensure consistency of the metadata. Because our approach has been designed with little to no assumptions concerning the workflow of an analysis, we expect it to be applicable in a wide range of computational research fields. Database URL: http://deep.mpi-inf.mpg.de/DAC/cmds/pub/pyvalid.zip PMID:26055099
Spatial Data Transfer Standard (SDTS)
,
1999-01-01
The American National Standards Institute?s (ANSI) Spatial Data Transfer Standard (SDTS) is a mechanism for archiving and transferring of spatial data (including metadata) between dissimilar computer systems. The SDTS specifies exchange constructs, such as format, structure, and content, for spatially referenced vector and raster (including gridded) data. The SDTS includes a flexible conceptual model, specifications for a quality report, transfer module specifications, data dictionary specifications, and definitions of spatial features and attributes.
NASA Astrophysics Data System (ADS)
Davey, Christopher A.; Pielke, Roger A., Sr.
2005-04-01
The U.S. Historical Climate Network is a subset of surface weather observation stations selected from the National Weather Service cooperative station network. The criteria used to select these stations do not sufficiently address station exposure characteristics. In addition, the current metadata available for cooperative network stations generally do not describe site exposure characteristics in sufficient detail. This paper focuses on site exposures with respect to air temperature measurements. A total of 57 stations were photographically surveyed in eastern Colorado, comparing existing exposures to the standards endorsed by the World Meteorological Organization. The exposures of most sites surveyed, including U.S. Historical Climate Network sites, were observed to fall short of these standards. This raises a critical question about the use of many Historical Climate Network sites in the development of long-term climate records and the detection of climate trends. Some of these sites clearly have poor exposures and therefore should be considered for removal from the Historical Climate Network. Candidate replacement sites do exist and should be considered for addition into the network to replace the removed sites. Documentation as performed for this study should be conducted worldwide in order to determine the extent of spatially nonrepresentative exposures and possible temperature biases.
CDGP, the data center for deep geothermal data from Alsace
NASA Astrophysics Data System (ADS)
Schaming, Marc; Grunberg, Marc; Jahn, Markus; Schmittbuhl, Jean; Cuenot, Nicolas; Genter, Albert; Dalmais, Eléonore
2016-04-01
CDGP (Centre de données de géothermie profonde, deep geothermal data center, http://cdgp.u-strasbg.fr) is set by the LabEX G-EAU-THERMIE PROFONDE to archive the high quality data collected in the Upper Rhine Graben geothermal sites and to distribute them to the scientific community for R&D activities, taking IPR (Intellectual Property Rights) into account. Collected datasets cover the whole life of geothermal projects, from exploration to drilling, stimulation, circulation and production. They originate from the Soultz-sous-Forêts pilot plant but also include more recent projects like the ECOGI project at Rittershoffen, Alsace, France. They are historically separated in two rather independent categories: geophysical datasets mostly related to the industrial management of the geothermal reservoir and seismological data from the seismic monitoring both during stimulations and circulations. Geophysical datasets are mainly up to now from the Soultz-sous-Forêts project that were stored on office's shelves and old digital media. Some inventories have been done recently, and a first step of the integration of these reservoir data into a PostgreSQL/postGIS database (ISO 19107 compatible) has been performed. The database links depths, temperatures, pressures, flows, for periods (times) and locations (geometries). Other geophysical data are still stored in structured directories as a data bank and need to be included in the database. Seismological datasets are of two kinds: seismological waveforms and seismicity bulletins; the former are stored in a standardized way both in format (miniSEED) and in files and directories structures (SDS) following international standard of the seismological community (FDSN), and the latter in a database following the open standard QuakeML. CDGP uses a cataloging application (GeoNetwork) to manage the metadata resources. It provides metadata editing and search functions as well as a web map viewer. The metadata editor supports ISO19115/119/110 standards used for spatial resources. A step forward will be to add specific metadata records as defined by the Open Geospatial Consortium to provide geophysical / geologic / reservoir information: Observations and Measurements (O&M) to describe the acquisition of information from a primary source, and SensorML to describe the sensors. Seismological metadata, which describe all the instrumental response, use the dateless SEED standard. Access to data will be handled in an additional step using geOrchestra spatial data infrastructure (SDI). Direct access will be granted after registration and validation using the single sign-on authentication system. Access to the data will also be granted via EPOS-IP Anthropogenic Hazards project. Access to episodes (time-correlated collections of geophysical, technological and other relevant geo-data over a geothermal area) and application of analysis (time- and technology-dependent probabilistic seismic hazard analysis, multi-hazard and multi-risk assessment) are services accessible via a portal and will require AAAI (Authentication, Authorization, Accounting and Identification).
Papež, Václav; Mouček, Roman
2017-01-01
The purpose of this study is to investigate the feasibility of applying openEHR (an archetype-based approach for electronic health records representation) to modeling data stored in EEGBase, a portal for experimental electroencephalography/event-related potential (EEG/ERP) data management. The study evaluates re-usage of existing openEHR archetypes and proposes a set of new archetypes together with the openEHR templates covering the domain. The main goals of the study are to (i) link existing EEGBase data/metadata and openEHR archetype structures and (ii) propose a new openEHR archetype set describing the EEG/ERP domain since this set of archetypes currently does not exist in public repositories. The main methodology is based on the determination of the concepts obtained from EEGBase experimental data and metadata that are expressible structurally by the openEHR reference model and semantically by openEHR archetypes. In addition, templates as the third openEHR resource allow us to define constraints over archetypes. Clinical Knowledge Manager (CKM), a public openEHR archetype repository, was searched for the archetypes matching the determined concepts. According to the search results, the archetypes already existing in CKM were applied and the archetypes not existing in the CKM were newly developed. openEHR archetypes support linkage to external terminologies. To increase semantic interoperability of the new archetypes, binding with the existing odML electrophysiological terminology was assured. Further, to increase structural interoperability, also other current solutions besides EEGBase were considered during the development phase. Finally, a set of templates using the selected archetypes was created to meet EEGBase requirements. A set of eleven archetypes that encompassed the domain of experimental EEG/ERP measurements were identified. Of these, six were reused without changes, one was extended, and four were newly created. All archetypes were arranged in the templates reflecting the EEGBase metadata structure. A mechanism of odML terminology referencing was proposed to assure semantic interoperability of the archetypes. The openEHR approach was found to be useful not only for clinical purposes but also for experimental data modeling.
NASA Astrophysics Data System (ADS)
Yamagishi, Y.; Yanaka, H.; Tsuboi, S.
2009-12-01
We have developed a conversion tool for the data of seismic tomography into KML, called KML generator, and made it available on the web site (http://www.jamstec.go.jp/pacific21/google_earth). The KML generator enables us to display vertical and horizontal cross sections of the model on Google Earth in three-dimensional manner, which would be useful to understand the Earth's interior. The previous generator accepts text files of grid-point data having longitude, latitude, and seismic velocity anomaly. Each data file contains the data for each depth. Metadata, such as bibliographic reference, grid-point interval, depth, are described in other information file. We did not allow users to upload their own tomographic model to the web application, because there is not standard format to represent tomographic model. Recently European seismology research project, NEIRES (Network of Research Infrastructures for European Seismology), advocates that the data of seismic tomography should be standardized. They propose a new format based on JSON (JavaScript Object Notation), which is one of the data-interchange formats, as a standard one for the tomography. This format consists of two parts, which are metadata and grid-point data values. The JSON format seems to be powerful to handle and to analyze the tomographic model, because the structure of the format is fully defined by JavaScript objects, thus the elements are directly accessible by a script. In addition, there exist JSON libraries for several programming languages. The International Federation of Digital Seismograph Network (FDSN) adapted this format as a FDSN standard format for seismic tomographic model. There might be a possibility that this format would not only be accepted by European seismologists but also be accepted as the world standard. Therefore we improve our KML generator for seismic tomography to accept the data file having also JSON format. We also improve the web application of the generator so that the JSON formatted data file can be uploaded. Users can convert any tomographic model data to KML. The KML obtained through the new generator should provide an arena to compare various tomographic models and other geophysical observations on Google Earth, which may act as a common platform for geoscience browser.
Online, interactive assessment of geothermal energy potential in the U.S
NASA Astrophysics Data System (ADS)
Allison, M. L.; Richard, S. M.; Clark, R.; Coleman, C.; Love, D.; Pape, E.; Musil, L.
2011-12-01
Geothermal-relevant geosciences data from all 50 states (www.stategeothermaldata.org), federal agencies, national labs, and academic centers are being digitized and linked in a distributed network via the U.S. Department of Energy-funded National Geothermal Data System (NGDS) to foster geothermal energy exploration and development through use of interactive online 'mashups,' data integration, and applications. Emphasis is first to make as much information as possible accessible, with a long range goal to make data interoperable through standardized services and interchange formats. Resources may be made available as documents (files) in whatever format they are currently in, converted to tabular files using standard content models, or published as Open Geospatial Consortium or ESRI Web services using the standard xml schema. An initial set of thirty geoscience data content models are in use or under development to define standardized interchange format: aqueous chemistry, borehole temperature data, direct use feature, drill stem test, earthquake hypocenter, fault feature, geologic contact feature, geologic unit feature, thermal/hot spring description, metadata, quaternary fault, volcanic vent description, well header feature, borehole lithology log, crustal stress, gravity, heat flow/temperature gradient, permeability, and feature description data like developed geothermal systems, geologic unit geothermal properties, permeability, production data, rock alteration description, rock chemistry, and thermal conductivity. Map services are also being developed for isopach maps (depth to bedrock), aquifer temperature maps, and several states are working on geothermal resource overview maps. Content models are developed preferentially from existing community use in order to encourage widespread adoption and promulgate minimum metadata quality standards. Geoscience data and maps from NGDS participating institutions (USGS, Southern Methodist University, Boise State University Geothermal Data Coalition) are being supplemented with extensive land management and land use resources from the Western Regional Partnership (15 federal agencies and 5 Western states) to provide access to a comprehensive, holistic set of data critical to geothermal energy development. As of August 2011, over 33,000 data resources have been registered in the system catalog, along with scores of Web services to deliver integrated data to the desktop for free downloading or online use. The data exchange mechanism is built on the U.S. Geoscience Information Network (USGIN, http://lab.usgin.org) protocols and standards developed in partnership with the U.S. Geological Survey.
A Common Metadata System for Marine Data Portals
NASA Astrophysics Data System (ADS)
Wosniok, C.; Breitbach, G.; Lehfeldt, R.
2012-04-01
Processing and allocation of marine datasets depend on the nature of the data resulting from field campaigns, continuous monitoring and numerical modeling. Two research and development projects in northern Germany manage different types of marine data. Due to different data characteristics and institutional frameworks separate data portals are required. This paper describes the integration of distributed marine data in Germany. The Marine Data Infrastructure of Germany (MDI-DE) supports public authorities in the German coastal zone with the implementation of European directives like INSPIRE or the Marine Strategy Framework Directive. This is carried out through setting up standardized web services within a network of participating coastal agencies and the installation of a common data portal (http://www.mdi-de.org), which integrates distributed marine data concerning coastal engineering, coastal water protection and nature conservation in an interoperable and harmonized manner for administrative and scientific purposes as well as for information of the general public. The Coastal Observation System for Northern and Arctic Seas (COSYNA) aims at developing and testing analysis systems for the operational synoptic description of the environmental status of the North Sea and of Arctic coastal waters. This is done by establishing a network of monitoring facilities and the provision of its data in near-real-time. In situ measurements with poles, ferry boxes, and buoys, together with remote sensing measurements, and the data assimilation of these data into simulation results enables COSYNA to provide pre-operational 'products', that are beyond the present routinely applied techniques in observation and modelling. The data allocation in near-real-time requires thoroughly executed data validation, which is processed on the fly before data is passed on to the COSYNA portal (http://kofserver2.hzg.de/codm/). Both projects apply OGC standards such as Web Mapping Service (WMS), Web Feature Service (WFS) and Sensor Observation Service (SOS), which ensures interoperability and extensibility. In addition, metadata as crucial components for searching and finding information in large data infrastructures is provided via the Catalogue Web Service (CS-W). MDI-DE and COSYNA rely on the metadata information system for marine metadata NOKIS, which reflects a metadata profile tailored for marine data according to the specifications of German coastal authorities. In spite of this common software base, interoperability between the two data collections requires constant alignments of the diverse data processed by the two portals. While monitoring data in the MDI-DE is currently rather campaign-based, COSYNA has to fit constantly evolving time series into metadata sets. With all data following the same metadata profile, we now reach full interoperability between the different data collections. The distributed marine information system provides options to search, find and visualise the harmonised results from continuous monitoring, field campaigns, numerical modeling and other data in one web client.
NASA Astrophysics Data System (ADS)
Yen, Y. N.; Weng, K. H.; Huang, H. Y.
2013-07-01
After over 30 years of practise and development, Taiwan's architectural conservation field is moving rapidly into digitalization and its applications. Compared to modern buildings, traditional Chinese architecture has considerably more complex elements and forms. To document and digitize these unique heritages in their conservation lifecycle is a new and important issue. This article takes the caisson ceiling of the Taipei Confucius Temple, octagonal with 333 elements in 8 types, as a case study for digitization practise. The application of metadata representation and 3D modelling are the two key issues to discuss. Both Revit and SketchUp were appliedin this research to compare its effectiveness to metadata representation. Due to limitation of the Revit database, the final 3D models wasbuilt with SketchUp. The research found that, firstly, cultural heritage databasesmustconvey that while many elements are similar in appearance, they are unique in value; although 3D simulations help the general understanding of architectural heritage, software such as Revit and SketchUp, at this stage, could onlybe used tomodel basic visual representations, and is ineffective indocumenting additional critical data ofindividually unique elements. Secondly, when establishing conservation lifecycle information for application in management systems, a full and detailed presentation of the metadata must also be implemented; the existing applications of BIM in managing conservation lifecycles are still insufficient. Results of the research recommends SketchUp as a tool for present modelling needs, and BIM for sharing data between users, but the implementation of metadata representation is of the utmost importance.
ERIC Educational Resources Information Center
Proceedings of the ASIST Annual Meeting, 2003
2003-01-01
Forty-six panels address topics including women in information science; users and usability; information studies; reference services; information policies; standards; interface design; information retrieval; information networks; metadata; shared access; e-commerce in libraries; knowledge organization; information science theories; digitization;…
Developing Interoperable Air Quality Community Portals
NASA Astrophysics Data System (ADS)
Falke, S. R.; Husar, R. B.; Yang, C. P.; Robinson, E. M.; Fialkowski, W. E.
2009-04-01
Web portals are intended to provide consolidated discovery, filtering and aggregation of content from multiple, distributed web sources targeted at particular user communities. This paper presents a standards-based information architectural approach to developing portals aimed at air quality community collaboration in data access and analysis. An important characteristic of the approach is to advance beyond the present stand-alone design of most portals to achieve interoperability with other portals and information sources. We show how using metadata standards, web services, RSS feeds and other Web 2.0 technologies, such as Yahoo! Pipes and del.icio.us, helps increase interoperability among portals. The approach is illustrated within the context of the GEOSS Architecture Implementation Pilot where an air quality community portal is being developed to provide a user interface between the portals and clearinghouse of the GEOSS Common Infrastructure and the air quality community catalog of metadata and data services.
NASA Astrophysics Data System (ADS)
Tsontos, V. M.; Arms, S. C.; Thompson, C. K.; Quach, N.; Lam, T.
2016-12-01
Earth science applications increasingly rely on the integration of multivariate data from diverse observational platforms. Whether for satellite mission cal/val, science or decision support, the coupling of remote sensing and in-situ field data is integral also to oceanographic workflows. This has prompted archives such as the PO.DAAC, NASA's physical oceanographic data archive, that historically has had a remote sensing focus, to adapt to better accommodate complex field campaign datasets. However, the inherent heterogeneity of in-situ datasets and their variable adherence to meta/data standards poses a significant impediment to interoperability, a problem originating early in the data lifecycle and significantly impacting stewardship and usability of these data long-term. Here we introduce a new initiative underway at PO.DAAC that seeks to catalyze efforts to address these challenges. It involves the enhancement and integration of available high TRL (Technology Readiness level) components for improved interoperability and support of in-situ data with a focus on a novel yet representative class of oceanographic field data: data from electronic tags deployed on a variety of marine species as biological sampling platforms in support of fisheries management and ocean observation efforts. This project seeks to demonstrate, deliver and ultimately sustain operationally a reusable and accessible set of tools to: 1) mediate reconciliation of heterogeneous source data into a tractable number of standardized formats consistent with earth science data standards; 2) harmonize existing metadata models for satellite and field datasets; 3) demonstrate the value added of integrated data access via a range of available tools and services hosted at the PO.DAAC, including a web-based visualization tool for comprehensive mapping of satellite and in-situ data. An innovative part of our project plan involves partnering with the leading electronic tag manufacturer to promote the adoption of appropriate data standards in their processing software. The proposed project thus adopts a model lifecycle approach complimented by broadly applicable technologies to address key data management and interoperability issues for in-situ data