metadata identification information: Topics by Science.gov

Sample records for metadata identification information

Definition of an ISO 19115 metadata profile for SeaDataNet II Cruise Summary Reports and its XML encoding

NASA Astrophysics Data System (ADS)

Boldrini, Enrico; Schaap, Dick M. A.; Nativi, Stefano

2013-04-01

SeaDataNet implements a distributed pan-European infrastructure for Ocean and Marine Data Management whose nodes are maintained by 40 national oceanographic and marine data centers from 35 countries riparian to all European seas. A unique portal makes possible distributed discovery, visualization and access of the available sea data across all the member nodes. Geographic metadata play an important role in such an infrastructure, enabling an efficient documentation and discovery of the resources of interest. In particular: - Common Data Index (CDI) metadata describe the sea datasets, including identification information (e.g. product title, interested area), evaluation information (e.g. data resolution, constraints) and distribution information (e.g. download endpoint, download protocol); - Cruise Summary Reports (CSR) metadata describe cruises and field experiments at sea, including identification information (e.g. cruise title, name of the ship), acquisition information (e.g. utilized instruments, number of samples taken) In the context of the second phase of SeaDataNet (SeaDataNet 2 EU FP7 project, grant agreement 283607, started on October 1st, 2011 for a duration of 4 years) a major target is the setting, adoption and promotion of common international standards, to the benefit of outreach and interoperability with the international initiatives and communities (e.g. OGC, INSPIRE, GEOSS, …). A standardization effort conducted by CNR with the support of MARIS, IFREMER, STFC, BODC and ENEA has led to the creation of a ISO 19115 metadata profile of CDI and its XML encoding based on ISO 19139. The CDI profile is now in its stable version and it's being implemented and adopted by the SeaDataNet community tools and software. The effort has then continued to produce an ISO based metadata model and its XML encoding also for CSR. The metadata elements included in the CSR profile belong to different models: - ISO 19115: E.g. cruise identification information, including title and area of interest; metadata responsible party information - ISO 19115-2: E.g. acquisition information, including date of sampling, instruments used - SeaDataNet: E.g. SeaDataNet community specific, including EDMO and EDMERP code lists Two main guidelines have been followed in the metadata model drafting: - All the obligations and constraints required by both the ISO standards and INSPIRE directive had to be satisfied. These include the presence of specific elements with given cardinality (e.g. mandatory metadata date stamp, mandatory lineage information) - All the content information of legacy CSR format had to be supported by the new metadata model. An XML encoding of the CSR profile has been defined as well. Based on the ISO 19139 XML schema and constraints, it adds the new elements specific of the SeaDataNet community. The associated Schematron rules are used to enforce constraints not enforceable just with the Schema and to validate elements content against the SeaDataNet code lists vocabularies.
What Information Does Your EHR Contain? Automatic Generation of a Clinical Metadata Warehouse (CMDW) to Support Identification and Data Access Within Distributed Clinical Research Networks.

PubMed

Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin

2017-01-01

Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.
Roogle: an information retrieval engine for clinical data warehouse.

PubMed

Cuggia, Marc; Garcelon, Nicolas; Campillo-Gimenez, Boris; Bernicot, Thomas; Laurent, Jean-François; Garin, Etienne; Happe, André; Duvauferrier, Régis

2011-01-01

High amount of relevant information is contained in reports stored in the electronic patient records and associated metadata. R-oogle is a project aiming at developing information retrieval engines adapted to these reports and designed for clinicians. The system consists in a data warehouse (full-text reports and structured data) imported from two different hospital information systems. Information retrieval is performed using metadata-based semantic and full-text search methods (as Google). Applications may be biomarkers identification in a translational approach, search of specific cases, and constitution of cohorts, professional practice evaluation, and quality control assessment.
The ATLAS Eventlndex: data flow and inclusion of other metadata

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Favareto, A.; Fernandez Casani, A.; Gallas, E. J.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Hrivnac, J.; Malon, D.; Prokoshin, F.; Salt, J.; Sanchez, J.; Toebbicke, R.; Yuan, R.; ATLAS Collaboration

2016-10-01

The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
The Self-Organized Archive: SPASE, PDS and Archive Cooperatives

NASA Astrophysics Data System (ADS)

King, T. A.; Hughes, J. S.; Roberts, D. A.; Walker, R. J.; Joy, S. P.

2005-05-01

Information systems with high quality metadata enable uses and services which often go beyond the original purpose. There are two types of metadata: annotations which are items that comment on or describe the content of a resource and identification attributes which describe the external properties of the resource itself. For example, annotations may indicate which columns are present in a table of data, whereas an identification attribute would indicate source of the table, such as the observatory, instrument, organization, and data type. When the identification attributes are collected and used as the basis of a search engine, a user can constrain on an attribute, the archive can then self-organize around the constraint, presenting the user with a particular view of the archive. In an archive cooperative where each participating data system or archive may have its own metadata standards, providing a multi-system search engine requires that individual archive metadata be mapped to a broad based standard. To explore how cooperative archives can form a larger self-organized archive we will show how the Space Physics Archive Search and Extract (SPASE) data model will allow different systems to create a cooperative and will use Planetary Data System (PDS) plus existing space physics activities as a demonstration.
Incorporating clinical metadata with digital image features for automated identification of cutaneous melanoma.

PubMed

Liu, Z; Sun, J; Smith, M; Smith, L; Warr, R

2013-11-01

Computer-assisted diagnosis (CAD) of malignant melanoma (MM) has been advocated to help clinicians to achieve a more objective and reliable assessment. However, conventional CAD systems examine only the features extracted from digital photographs of lesions. Failure to incorporate patients' personal information constrains the applicability in clinical settings. To develop a new CAD system to improve the performance of automatic diagnosis of melanoma, which, for the first time, incorporates digital features of lesions with important patient metadata into a learning process. Thirty-two features were extracted from digital photographs to characterize skin lesions. Patients' personal information, such as age, gender and, lesion site, and their combinations, was quantified as metadata. The integration of digital features and metadata was realized through an extended Laplacian eigenmap, a dimensionality-reduction method grouping lesions with similar digital features and metadata into the same classes. The diagnosis reached 82.1% sensitivity and 86.1% specificity when only multidimensional digital features were used, but improved to 95.2% sensitivity and 91.0% specificity after metadata were incorporated appropriately. The proposed system achieves a level of sensitivity comparable with experienced dermatologists aided by conventional dermoscopes. This demonstrates the potential of our method for assisting clinicians in diagnosing melanoma, and the benefit it could provide to patients and hospitals by greatly reducing unnecessary excisions of benign naevi. This paper proposes an enhanced CAD system incorporating clinical metadata into the learning process for automatic classification of melanoma. Results demonstrate that the additional metadata and the mechanism to incorporate them are useful for improving CAD of melanoma. © 2013 British Association of Dermatologists.
Metadata and Service at the GFZ ISDC Portal

NASA Astrophysics Data System (ADS)

Ritschel, B.

2008-05-01

The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.
Air Quality uFIND: User-oriented Tool Set for Air Quality Data Discovery and Access

NASA Astrophysics Data System (ADS)

Hoijarvi, K.; Robinson, E. M.; Husar, R. B.; Falke, S. R.; Schultz, M. G.; Keating, T. J.

2012-12-01

Historically, there have been major impediments to seamless and effective data usage encountered by both data providers and users. Over the last five years, the international Air Quality (AQ) Community has worked through forums such as the Group on Earth Observations AQ Community of Practice, the ESIP AQ Working Group, and the Task Force on Hemispheric Transport of Air Pollution to converge on data format standards (e.g., netCDF), data access standards (e.g., Open Geospatial Consortium Web Coverage Services), metadata standards (e.g., ISO 19115), as well as other conventions (e.g., CF Naming Convention) in order to build an Air Quality Data Network. The centerpiece of the AQ Data Network is the web service-based tool set: user-oriented Filtering and Identification of Networked Data. The purpose of uFIND is to provide rich and powerful facilities for the user to: a) discover and choose a desired dataset by navigation through the multi-dimensional metadata space using faceted search, b) seamlessly access and browse datasets, and c) use uFINDs facilities as a web service for mashups with other AQ applications and portals. In a user-centric information system such as uFIND, the user experience is improved by metadata that includes the general fields for discovery as well as community-specific metadata to narrow the search beyond space, time and generic keyword searches. However, even with the community-specific additions, the ISO 19115 records were formed in compliance with the standard, so that other standards-based search interface could leverage this additional information. To identify the fields necessary for metadata discovery we started with the ISO 19115 Core Metadata fields and fields that were needed for a Catalog Service for the Web (CSW) Record. This fulfilled two goals - one to create valid ISO 19115 records and the other to be able to retrieve the records through a Catalog Service for the Web query. Beyond the required set of fields, the AQ Community added additional fields using a combination of keywords and ISO 19115 fields. These extensions allow discovery by measurement platform or observed phenomena. Beyond discovery metadata, the AQ records include service identification objects that allow standards-based clients, such as some brokers, to access the data found via OGC WCS or WMS data access protocols. uFIND, is one such smart client, this combination of discovery and access metadata allows the user to preview each registered dataset through spatial and temporal views; observe the data access and usage pattern and also find links to dataset-specific metadata directly in uFIND. The AQ data providers also benefit from this architecture since their data products are easier to find and re-use, enhancing the relevance and importance of their products. Finally, the earth science community at large benefits from the Service Oriented Architecture of uFIND, since it is a service itself and allows service-based interfacing with providers and users of the metadata, allowing uFIND facets to be further refined for a particular AQ application or completely repurposed for other Earth Science domains that use the same set of data access and metadata standards.
Development of Health Information Search Engine Based on Metadata and Ontology

PubMed Central

Song, Tae-Min; Jin, Dal-Lae

2014-01-01

Objectives The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Methods Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Conclusions Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers. PMID:24872907
Development of health information search engine based on metadata and ontology.

PubMed

Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

2014-04-01

The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
Government information resource catalog and its service system realization

NASA Astrophysics Data System (ADS)

Gui, Sheng; Li, Lin; Wang, Hong; Peng, Zifeng

2007-06-01

During the process of informatization, there produces a great deal of information resources. In order to manage these information resources and use them to serve the management of business, government decision and public life, it is necessary to establish a transparent and dynamic information resource catalog and its service system. This paper takes the land-house management information resource for example. Aim at the characteristics of this kind of information, this paper does classification, identification and description of land-house information in an uniform specification and method, establishes land-house information resource catalog classification system&, metadata standard, identification standard and land-house thematic thesaurus, and in the internet environment, user can search and get their interested information conveniently. Moreover, under the network environment, to achieve speedy positioning, inquiring, exploring and acquiring various types of land-house management information; and satisfy the needs of sharing, exchanging, application and maintenance of land-house management information resources.
Turning Data into Information: Assessing and Reporting GIS Metadata Integrity Using Integrated Computing Technologies

ERIC Educational Resources Information Center

Mulrooney, Timothy J.

2009-01-01

A Geographic Information System (GIS) serves as the tangible and intangible means by which spatially related phenomena can be created, analyzed and rendered. GIS metadata serves as the formal framework to catalog information about a GIS data set. Metadata is independent of the encoded spatial and attribute information. GIS metadata is a subset of…
Metadata squared: enhancing its usability for volunteered geographic information and the GeoWeb

USGS Publications Warehouse

Poore, Barbara S.; Wolf, Eric B.; Sui, Daniel Z.; Elwood, Sarah; Goodchild, Michael F.

2013-01-01

The Internet has brought many changes to the way geographic information is created and shared. One aspect that has not changed is metadata. Static spatial data quality descriptions were standardized in the mid-1990s and cannot accommodate the current climate of data creation where nonexperts are using mobile phones and other location-based devices on a continuous basis to contribute data to Internet mapping platforms. The usability of standard geospatial metadata is being questioned by academics and neogeographers alike. This chapter analyzes current discussions of metadata to demonstrate how the media shift that is occurring has affected requirements for metadata. Two case studies of metadata use are presented—online sharing of environmental information through a regional spatial data infrastructure in the early 2000s, and new types of metadata that are being used today in OpenStreetMap, a map of the world created entirely by volunteers. Changes in metadata requirements are examined for usability, the ease with which metadata supports coproduction of data by communities of users, how metadata enhances findability, and how the relationship between metadata and data has changed. We argue that traditional metadata associated with spatial data infrastructures is inadequate and suggest several research avenues to make this type of metadata more interactive and effective in the GeoWeb.
The New Online Metadata Editor for Generating Structured Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Shrestha, B.; Palanisamy, G.; Hook, L.; Killeffer, T.; Boden, T.; Cook, R. B.; Zolly, L.; Hutchison, V.; Frame, M. T.; Cialella, A. T.; Lazer, K.

2014-12-01

Nobody is better suited to "describe" data than the scientist who created it. This "description" about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [1] [2]. Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. How is OME helping Big Data Centers like ORNL DAAC? The ORNL DAAC is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers managed by the ESDIS Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability.References:[1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [2] Wilson, Bruce E., et al. "Mercury Toolset for Spatiotemporal Metadata." NASA Technical Reports Server (NTRS) (2010).
Assessing Metadata Quality of a Federally Sponsored Health Data Repository.

PubMed

Marc, David T; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn't frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality.
Assessing Metadata Quality of a Federally Sponsored Health Data Repository

PubMed Central

Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn’t frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality. PMID:28269883
Habitat-Lite: A GSC case study based on free text terms for environmental metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kyrpides, Nikos; Hirschman, Lynette; Clark, Cheryl

2008-04-01

There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed 'Minimum Information about a Genome Sequence' (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ('Habitat-Lite') that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs ofmore » multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semi-automated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version of Habitat-Lite would provide useful labels for over 60% of the kinds of information found in the GenBank isolation-source field, and around 85% of the terms in the GOLD habitat field. We present a revised version of Habitat-Lite and invite the community's feedback on its further development in order to provide a minimum list of terms to capture high-level habitat information and to provide classification bins needed for future studies.« less
{Semantic metadata application for information resources systematization in water spectroscopy} A.Fazliev (1), A.Privezentsev (1), J.Tennyson (2) (1) Institute of Atmospheric Optics SB RAS, Tomsk, Russia, (2) University College London, London, UK (faz@iao

NASA Astrophysics Data System (ADS)

Fazliev, A.

2009-04-01

The information and knowledge layers of information-computational system for water spectroscopy are described. Semantic metadata for all the tasks of domain information model that are the basis of the layers have been studied. The principle of semantic metadata determination and mechanisms of the usage during information systematization in molecular spectroscopy has been revealed. The software developed for the work with semantic metadata is described as well. Formation of domain model in the framework of Semantic Web is based on the use of explicit specification of its conceptualization or, in other words, its ontologies. Formation of conceptualization for molecular spectroscopy was described in Refs. 1, 2. In these works two chains of task are selected for zeroth approximation for knowledge domain description. These are direct tasks chain and inverse tasks chain. Solution schemes of these tasks defined approximation of data layer for knowledge domain conceptualization. Spectroscopy tasks solutions properties lead to a step-by-step extension of molecular spectroscopy conceptualization. Information layer of information system corresponds to this extension. An advantage of molecular spectroscopy model designed in a form of tasks chain is actualized in the fact that one can explicitly define data and metadata at each step of solution of these molecular spectroscopy chain tasks. Metadata structure (tasks solutions properties) in knowledge domain also has form of a chain in which input data and metadata of the previous task become metadata of the following tasks. The term metadata is used in its narrow sense: metadata are the properties of spectroscopy tasks solutions. Semantic metadata represented with the help of OWL 3 are formed automatically and they are individuals of classes (A-box). Unification of T-box and A-box is an ontology that can be processed with the help of inference engine. In this work we analyzed the formation of individuals of molecular spectroscopy applied ontologies as well as the software used for their creation by means of OWL DL language. The results of this work are presented in a form of an information layer and a knowledge layer in W@DIS information system 4. 1 FORMATION OF INDIVIDUALS OF WATER SPECTROSCOPY APPLIED ONTOLOGY Applied tasks ontology contains explicit description of input an output data of physical tasks solved in two chains of molecular spectroscopy tasks. Besides physical concepts, related to spectroscopy tasks solutions, an information source, which is a key concept of knowledge domain information model, is also used. Each solution of knowledge domain task is linked to the information source which contains a reference on published task solution, molecule and task solution properties. Each information source allows us to identify a certain knowledge domain task solution contained in the information system. Water spectroscopy applied ontology classes are formed on the basis of molecular spectroscopy concepts taxonomy. They are defined by constrains on properties of the selected conceptualization. Extension of applied ontology in W@DIS information system is actualized according to two scenarios. Individuals (ontology facts or axioms) formation is actualized during the task solution upload in the information system. Ontology user operation that implies molecular spectroscopy taxonomy and individuals is performed solely by the user. For this purpose Protege ontology editor was used. For the formation, processing and visualization of knowledge domain tasks individuals a software was designed and implemented. Method of individual formation determines the sequence of steps of created ontology individuals' generation. Tasks solutions properties (metadata) have qualitative and quantitative values. Qualitative metadata are regarded as metadata describing qualitative side of a task such as solution method or other information that can be explicitly specified by object properties of OWL DL language. Quantitative metadata are metadata that describe quantitative properties of task solution such as minimal and maximal data value or other information that can be explicitly obtained by programmed algorithmic operations. These metadata are related to DatatypeProperty properties of OWL specification language Quantitative metadata can be obtained automatically during data upload into information system. Since ObjectProperty values are objects, processing of qualitative metadata requires logical constraints. In case of the task solved in W@DIS ICS qualitative metadata can be formed automatically (for example in spectral functions calculation task). The used methods of translation of qualitative metadata into quantitative is characterized as roughened representation of knowledge in knowledge domain. The existence of two ways of data obtainment is a key moment in the formation of applied ontology of molecular spectroscopy task. experimental method (metadata for experimental data contain description of equipment, experiment conditions and so on) on the initial stage and inverse task solution on the following stages; calculation method (metadata for calculation data are closely related to the metadata used for the description of physical and mathematical models of molecular spectroscopy) 2 SOFTWARE FOR ONTOLOGY OPERATION Data collection in water spectroscopy information system is organized in a form of workflow that contains such operations as information source creation, entry of bibliographic data on publications, formation of uploaded data schema an so on. Metadata are generated in information source as well. Two methods are used for their formation: automatic metadata generation and manual metadata generation (performed by user). Software implementation of support of actions related to metadata formation is performed by META+ module. Functions of META+ module can be divided into two groups. The first groups contains the functions necessary to software developer while the second one the functions necessary to a user of the information system. META+ module functions necessary to the developer are: 1. creation of taxonomy (T-boxes) of applied ontology classes of knowledge domain tasks; 2. creation of instances of task classes; 3. creation of data schemes of tasks in a form of an XML-pattern and based on XML-syntax. XML-pattern is developed for instances generator and created according to certain rules imposed on software generator implementation. 4. implementation of metadata values calculation algorithms; 5. creation of a request interface and additional knowledge processing function for the solution of these task; 6. unification of the created functions and interfaces into one information system The following sequence is universal for the generation of task classes' individuals that form chains. Special interfaces for user operations management are designed for software developer in META+ module. There are means for qualitative metadata values updating during data reuploading to information source. The list of functions necessary to end user contains: - data sets visualization and editing, taking into account their metadata, e.g.: display of unique number of bands in transitions for a certain data source; - export of OWL/RDF models from information system to the environment in XML-syntax; - visualization of instances of classes of applied ontology tasks on molecular spectroscopy; - import of OWL/RDF models into the information system and their integration with domain vocabulary; - formation of additional knowledge of knowledge domain for the construction of ontological instances of task classes using GTML-formats and their processing; - formation of additional knowledge in knowledge domain for the construction of instances of task classes, using software algorithm for data sets processing; - function of semantic search implementation using an interface that formulates questions in a form of related triplets in order for getting an adequate answer. 3 STRUCTURE OF META+ MODULE META+ software module that provides the above functions contains the following components: - a knowledge base that stores semantic metadata and taxonomies of information system; - software libraries POWL and RAP 5 created by third-party developer and providing access to ontological storage; - function classes and libraries that form the core of the module and perform the tasks of formation, storage and visualization of classes instances; - configuration files and module patterns that allow one to adjust and organize operation of different functional blocks; META+ module also contains scripts and patterns implemented according to the rules of W@DIS information system development environment. - scripts for interaction with environment by means of the software core of information system. These scripts provide organizing web-oriented interactive communication; - patterns for the formation of functionality visualization realized by the scripts Software core of scientific information-computational system W@DIS is created with the help of MVC (Model - View - Controller) design pattern that allows us to separate logic of application from its representation. It realizes the interaction of three logical components, actualizing interactivity with the environment via Web and performing its preprocessing. Functions of «Controller» logical component are realized with the help of scripts designed according to the rules imposed by software core of the information system. Each script represents a definite object-oriented class with obligatory class method of script initiation called "start". Functions of actualization of domain application operation results representation (i.e. "View" component) are sets of HTML-patterns that allow one to visualize the results of domain applications operation with the help of additional constructions processed by software core of the system. Besides the interaction with the software core of the scientific information system this module also deals with configuration files of software core and its database. Such organization of work provides closer integration with software core and deeper and more adequate connection in operating system support. 4 CONCLUSION In this work the problems of semantic metadata creation in information system oriented on information representation in the area of molecular spectroscopy have been discussed. The described method of semantic metadata and functions formation as well as realization and structure of META+ module have been described. Architecture of META+ module is closely related to the existing software of "Molecular spectroscopy" scientific information system. Realization of the module is performed with the use of modern approaches to Web-oriented applications development. It uses the existing applied interfaces. The developed software allows us to: - perform automatic metadata annotation of calculated tasks solutions directly in the information system; - perform automatic annotation of metadata on the solution of tasks on task solution results uploading outside the information system forming an instance of the solved task on the basis of entry data; - use ontological instances of task solution for identification of data in information tasks of viewing, comparison and search solved by information system; - export applied tasks ontologies for the operation with them by external means; - solve the task of semantic search according to the pattern and using question-answer type interface. 5 ACKNOWLEDGEMENT The authors are grateful to RFBR for the financial support of development of distributed information system for molecular spectroscopy. REFERENCES A.D.Bykov, A.Z. Fazliev, N.N.Filippov, A.V. Kozodoev, A.I.Privezentsev, L.N.Sinitsa, M.V.Tonkov and M.Yu.Tretyakov, Distributed information system on atmospheric spectroscopy // Geophysical Research Abstracts, SRef-ID: 1607-7962/gra/EGU2007-A-01906, 2007, v. 9, p. 01906. A.I.Prevezentsev, A.Z. Fazliev Applied task ontology for molecular spectroscopy information resources systematization. The Proceedings of 9th Russian scientific conference "Electronic libraries: advanced methods and technologies, electronic collections" - RCDL'2007, Pereslavl Zalesskii, 2007, part.1, 2007, P.201-210. OWL Web Ontology Language Semantics and Abstract Syntax, W3C Recommendation 10 February 2004, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ W@DIS information system, http://wadis.saga.iao.ru RAP library, http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/.
Image processing tool for automatic feature recognition and quantification

DOEpatents

Chen, Xing; Stoddard, Ryan J.

2017-05-02

A system for defining structures within an image is described. The system includes reading of an input file, preprocessing the input file while preserving metadata such as scale information and then detecting features of the input file. In one version the detection first uses an edge detector followed by identification of features using a Hough transform. The output of the process is identified elements within the image.
Automatic Content Recommendation and Aggregation According to SCORM

ERIC Educational Resources Information Center

Neves, Daniel Eugênio; Brandão, Wladmir Cardoso; Ishitani, Lucila

2017-01-01

Although widely used, the SCORM metadata model for content aggregation is difficult to be used by educators, content developers and instructional designers. Particularly, the identification of contents related with each other, in large repositories, and their aggregation using metadata as defined in SCORM, has been demanding efforts of computer…

Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

PubMed

Dave, Jaydev K; Gingold, Eric L

2013-01-01

The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.
Tools for proactive collection and use of quality metadata in GEOSS

NASA Astrophysics Data System (ADS)

Bastin, L.; Thum, S.; Maso, J.; Yang, K. X.; Nüst, D.; Van den Broek, M.; Lush, V.; Papeschi, F.; Riverola, A.

2012-12-01

The GEOSS Common Infrastructure allows interactive evaluation and selection of Earth Observation datasets by the scientific community and decision makers, but the data quality information needed to assess fitness for use is often patchy and hard to visualise when comparing candidate datasets. In a number of studies over the past decade, users repeatedly identified the same types of gaps in quality metadata, specifying the need for enhancements such as peer and expert review, better traceability and provenance information, information on citations and usage of a dataset, warning about problems identified with a dataset and potential workarounds, and 'soft knowledge' from data producers (e.g. recommendations for use which are not easily encoded using the existing standards). Despite clear identification of these issues in a number of recommendations, the gaps persist in practice and are highlighted once more in our own, more recent, surveys. This continuing deficit may well be the result of a historic paucity of tools to support the easy documentation and continual review of dataset quality. However, more recent developments in tools and standards, as well as more general technological advances, present the opportunity for a community of scientific users to adopt a more proactive attitude by commenting on their uses of data, and for that feedback to be federated with more traditional and static forms of metadata, allowing a user to more accurately assess the suitability of a dataset for their own specific context and reliability thresholds. The EU FP7 GeoViQua project aims to develop this opportunity by adding data quality representations to the existing search and visualisation functionalities of the Geo Portal. Subsequently we will help to close the gap by providing tools to easily create quality information, and to permit user-friendly exploration of that information as the ultimate incentive for improved data quality documentation. Quality information is derived from producer metadata, from the data themselves, from validation of in-situ sensor data, from provenance information and from user feedback, and will be aggregated to produce clear and useful summaries of quality, including a GEO Label. GeoViQua's conceptual quality information models for users and producers are specifically described and illustrated in this presentation. These models (which have been encoded as XML schemas and can be accessed at http://schemas.geoviqua.org/) are designed to satisfy the identified user needs while remaining consistent with current standards such as ISO 19115 and advanced drafts such as ISO 19157. The resulting components being developed for the GEO Portal are designed to lower the entry barrier to users who wish to help to generate and explore rich and useful metadata. This metadata will include reviews, comments and ratings, reports of usage in specific domains and specification of datasets used for benchmarking, as well as rich quantitative information encoded in more traditional data quality elements such as thematic correctness and positional accuracy. The value of the enriched metadata will also be enhanced by graphical tools for visualizing spatially distributed uncertainties. We demonstrate practical example applications in selected environmental application domains.
Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

USGS Publications Warehouse

Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

2018-04-12

The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses some of the major data and metadata management challenges that exist across the diverse missions of the bureaus and offices. All employees who create, modify, or use data are involved with data and metadata management. Identifying, establishing, and formalizing the roles and responsibilities associated with metadata management are key to institutionalizing a framework of best practices, methodologies, processes, and common approaches throughout all levels of the organization; these are the foundation for effective data resource management. For executives and managers, metadata management strengthens their overarching views of data assets, holdings, and data interoperability; and clarifies how metadata management can help accelerate the compliance of multiple policy mandates. For employees, data stewards, and data professionals, formalized metadata management will help with the consistency of definitions, and approaches addressing data discoverability, data quality, and data lineage. In addition to data professionals and others associated with information technology; data stewards and program subject matter experts take on important metadata management roles and responsibilities as data flow through their respective business and science-related workflows. The responsibilities of establishing, practicing, and governing the actions associated with their specific metadata management roles are critical to successful metadata implementation.
Managing Complex Change in Clinical Study Metadata

PubMed Central

Brandt, Cynthia A.; Gadagkar, Rohit; Rodriguez, Cesar; Nadkarni, Prakash M.

2004-01-01

In highly functional metadata-driven software, the interrelationships within the metadata become complex, and maintenance becomes challenging. We describe an approach to metadata management that uses a knowledge-base subschema to store centralized information about metadata dependencies and use cases involving specific types of metadata modification. Our system borrows ideas from production-rule systems in that some of this information is a high-level specification that is interpreted and executed dynamically by a middleware engine. Our approach is implemented in TrialDB, a generic clinical study data management system. We review approaches that have been used for metadata management in other contexts and describe the features, capabilities, and limitations of our system. PMID:15187070
Visualization of JPEG Metadata

NASA Astrophysics Data System (ADS)

Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.
Manifestations of Metadata: From Alexandria to the Web--Old is New Again

ERIC Educational Resources Information Center

Kennedy, Patricia

2008-01-01

This paper is a discussion of the use of metadata, in its various manifestations, to access information. Information management standards are discussed. The connection between the ancient world and the modern world is highlighted. Individual perspectives are paramount in fulfilling information seeking. Metadata is interpreted and reflected upon in…
Towards Data Value-Level Metadata for Clinical Studies.

PubMed

Zozus, Meredith Nahm; Bonner, Joseph

2017-01-01

While several standards for metadata describing clinical studies exist, comprehensive metadata to support traceability of data from clinical studies has not been articulated. We examine uses of metadata in clinical studies. We examine and enumerate seven sources of data value-level metadata in clinical studies inclusive of research designs across the spectrum of the National Institutes of Health definition of clinical research. The sources of metadata inform categorization in terms of metadata describing the origin of a data value, the definition of a data value, and operations to which the data value was subjected. The latter is further categorized into information about changes to a data value, movement of a data value, retrieval of a data value, and data quality checks, constraints or assessments to which the data value was subjected. The implications of tracking and managing data value-level metadata are explored.
The Digital Sample: Metadata, Unique Identification, and Links to Data and Publications

NASA Astrophysics Data System (ADS)

Lehnert, K. A.; Vinayagamoorthy, S.; Djapic, B.; Klump, J.

2006-12-01

A significant part of digital data in the Geosciences refers to physical samples of Earth materials, from igneous rocks to sediment cores to water or gas samples. The application and long-term utility of these sample-based data in research is critically dependent on (a) the availability of information (metadata) about the samples such as geographical location and time of sampling, or sampling method, (b) links between the different data types available for individual samples that are dispersed in the literature and in digital data repositories, and (c) access to the samples themselves. Major problems for achieving this include incomplete documentation of samples in publications, use of ambiguous sample names, and the lack of a central catalog that allows to find a sample's archiving location. The International Geo Sample Number IGSN, managed by the System for Earth Sample Registration SESAR, provides solutions for these problems. The IGSN is a unique persistent identifier for samples and other GeoObjects that can be obtained by submitting sample metadata to SESAR (www.geosamples.org). If data in a publication is referenced to an IGSN (rather than an ambiguous sample name), sample metadata can readily be extracted from the SESAR database, which evolves into a Global Sample Catalog that also allows to locate the owner or curator of the sample. Use of the IGSN in digital data systems allows building linkages between distributed data. SESAR is contributing to the development of sample metadata standards. SESAR will integrate the IGSN in persistent, resolvable identifiers based on the handle.net service to advance direct linkages between the digital representation of samples in SESAR (sample profiles) and their related data in the literature and in web-accessible digital data repositories. Technologies outlined by Klump et al. (this session) such as the automatic creation of ontologies by text mining applications will be explored for harvesting identifiers of publications and datasets that contain information about a specific sample in order to establish comprehensive data profiles for samples.
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses

PubMed Central

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.

PubMed

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
Case Studies of Ecological Integrative Information Systems: The Luquillo and Sevilleta Information Management Systems

NASA Astrophysics Data System (ADS)

San Gil, Inigo; White, Marshall; Melendez, Eda; Vanderbilt, Kristin

The thirty-year-old United States Long Term Ecological Research Network has developed extensive metadata to document their scientific data. Standard and interoperable metadata is a core component of the data-driven analytical solutions developed by this research network Content management systems offer an affordable solution for rapid deployment of metadata centered information management systems. We developed a customized integrative metadata management system based on the Drupal content management system technology. Building on knowledge and experience with the Sevilleta and Luquillo Long Term Ecological Research sites, we successfully deployed the first two medium-scale customized prototypes. In this paper, we describe the vision behind our Drupal based information management instances, and list the features offered through these Drupal based systems. We also outline the plans to expand the information services offered through these metadata centered management systems. We will conclude with the growing list of participants deploying similar instances.
Metadata (MD)

Treesearch

Robert E. Keane

2006-01-01

The Metadata (MD) table in the FIREMON database is used to record any information about the sampling strategy or data collected using the FIREMON sampling procedures. The MD method records metadata pertaining to a group of FIREMON plots, such as all plots in a specific FIREMON project. FIREMON plots are linked to metadata using a unique metadata identifier that is...
A metadata template for ocean acidification data

NASA Astrophysics Data System (ADS)

Jiang, L.

2014-12-01

Metadata is structured information that describes, explains, and locates an information resource (e.g., data). It is often coarsely described as data about data, and documents information such as what was measured, by whom, when, where, and how it was sampled, analyzed, with what instruments. Metadata is inherent to ensure the survivability and accessibility of the data into the future. With the rapid expansion of biological response ocean acidification (OA) studies, the lack of a common metadata template to document such type of data has become a significant gap for ocean acidification data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses of organisms on ocean acidification. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as principal investigators, temporal and spatial coverage, platforms for the sampling, data citation are essential components to complete the template. We explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers. For that reason, this paper can serve as a user's manual for the template.
The Energy Industry Profile of ISO/DIS 19115-1: Facilitating Discovery and Evaluation of, and Access to Distributed Information Resources

NASA Astrophysics Data System (ADS)

Hills, S. J.; Richard, S. M.; Doniger, A.; Danko, D. M.; Derenthal, L.; Energistics Metadata Work Group

2011-12-01

A diverse group of organizations representative of the international community involved in disciplines relevant to the upstream petroleum industry, - energy companies, - suppliers and publishers of information to the energy industry, - vendors of software applications used by the industry, - partner government and academic organizations, has engaged in the Energy Industry Metadata Standards Initiative. This Initiative envisions the use of standard metadata within the community to enable significant improvements in the efficiency with which users discover, evaluate, and access distributed information resources. The metadata standard needed to realize this vision is the initiative's primary deliverable. In addition to developing the metadata standard, the initiative is promoting its adoption to accelerate realization of the vision, and publishing metadata exemplars conformant with the standard. Implementation of the standard by community members, in the form of published metadata which document the information resources each organization manages, will allow use of tools requiring consistent metadata for efficient discovery and evaluation of, and access to, information resources. While metadata are expected to be widely accessible, access to associated information resources may be more constrained. The initiative is being conducting by Energistics' Metadata Work Group, in collaboration with the USGIN Project. Energistics is a global standards group in the oil and natural gas industry. The Work Group determined early in the initiative, based on input solicited from 40+ organizations and on an assessment of existing metadata standards, to develop the target metadata standard as a profile of a revised version of ISO 19115, formally the "Energy Industry Profile of ISO/DIS 19115-1 v1.0" (EIP). The Work Group is participating on the ISO/TC 211 project team responsible for the revision of ISO 19115, now ready for "Draft International Standard" (DIS) status. With ISO 19115 an established, capability-rich, open standard for geographic metadata, EIP v1 is expected to be widely acceptable within the community and readily sustainable over the long-term. The EIP design, also per community requirements, will enable discovery, evaluation, and access to types of information resources considered important to the community, including structured and unstructured digital resources, and physical assets such as hardcopy documents and material samples. This presentation will briefly review the development of this initiative as well as the current and planned Work Group activities. More time will be spent providing an overview of the EIP v1, including the requirements it prescribes, design efforts made to enable automated metadata capture and processing, and the structure and content of its documentation, which was written to minimize ambiguity and facilitate implementation. The Work Group considers EIP v1 a solid initial design for interoperable metadata, and first step toward the vision of the Initiative.
Metadata Design in the New PDS4 Standards - Something for Everybody

NASA Astrophysics Data System (ADS)

Raugh, Anne C.; Hughes, John S.

2015-11-01

The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data preparers.
Forum Guide to Metadata: The Meaning behind Education Data. NFES 2009-805

ERIC Educational Resources Information Center

National Forum on Education Statistics, 2009

2009-01-01

The purpose of this guide is to empower people to more effectively use data as information. To accomplish this, the publication explains what metadata are; why metadata are critical to the development of sound education data systems; what components comprise a metadata system; what value metadata bring to data management and use; and how to…
Metadata, PICS and Quality.

ERIC Educational Resources Information Center

Armstrong, C. J.

1997-01-01

Discusses PICS (Platform for Internet Content Selection), the Centre for Information Quality Management (CIQM), and metadata. Highlights include filtering networked information; the quality of information; and standardizing search engines. (LRW)
The Use of Metadata Visualisation Assist Information Retrieval

DTIC Science & Technology

2007-10-01

album title, the track length and the genre of music . Again, any of these pieces of information can be used to quickly search and locate specific...that person. Music files also have metadata tags, in a format called ID3. This usually contains information such as the artist, the song title, the...tracks, to provide more information about the entire music collection, or to find similar or diverse tracks within the collection. Metadata is
Metadata registry and management system based on ISO 11179 for cancer clinical trials information system

PubMed Central

Park, Yu Rang; Kim*, Ju Han

2006-01-01

Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675
A Tsunami-Focused Tide Station Data Sharing Framework

NASA Astrophysics Data System (ADS)

Kari, U. S.; Marra, J. J.; Weinstein, S. A.

2006-12-01

The Indian Ocean Tsunami of 26 December 2004 made it clear that information about tide stations that could be used to support detection and warning (such as location, collection and transmission capabilities, operator identification) are insufficiently known or not readily accessible. Parties interested in addressing this problem united under the Pacific Region Data Integrated Data Enterprise (PRIDE), and in 2005 began a multiyear effort to develop a distributed metadata system describing tide stations starting with pilot activities in a regional framework and focusing on tsunami detection and warning systems being developed by various agencies. First, a plain semantic description of the tsunami-focused tide station metadata was developed. The semantic metadata description was, in turn, developed into a formal metadata schema championed by International Tsunami Information Centre (ITIC) as part of a larger effort to develop a prototype web service under the PRIDE program in 2005. Under the 2006 PRIDE program the formal metadata schema was then expanded to corral input parameters for the TideTool application used by Pacific Tsunami Warning Center (PTWC) to drill down into wave activity at a tide station that is located using a web service developed on this metadata schema. This effort contributed to formalization of web service dissemination of PTWC watch and warning tsunami bulletins. During this time, the data content and sharing issues embodied in this schema have been discussed at various forums. The result is that the various stakeholders have different data provider and user perspectives (semantic content) and also exchange formats (not limited to just XML). The challenge then, is not only to capture all data requirements, but also to have formal representation that is easily transformed into any specified format. The latest revision of the tide gauge schema (Version 0.3), begins to address this challenge. It encompasses a broader range of provider and user perspectives, such as station operators, warning system managers, disaster managers, other marine hazard warning systems (such as storm surges and sea level change monitoring and research. In the next revision(s), we hope to take into account various relevant standards, including specifically, the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) Framework, that will serve all prospective stakeholders in the most useful (extensible, scalable) manner. This is because Sensor ML has addressed many of the challenges we face already, through very useful fundamental modeling consideration and data types that are particular to sensors in general, with perhaps some extension needed for tide gauges. As a result of developing this schema, and associated client application architectures, we hope to have a much more distributed network of data providers, who are able to contribute to a global tide station metadata from the comfort of their own Information Technology (IT) departments.

Metadata Means Communication: The Challenges of Producing Useful Metadata

NASA Astrophysics Data System (ADS)

Edwards, P. N.; Batcheller, A. L.

2010-12-01

Metadata are increasingly perceived as an important component of data sharing systems. For instance, metadata accompanying atmospheric model output may indicate the grid size, grid type, and parameter settings used in the model configuration. We conducted a case study of a data portal in the atmospheric sciences using in-depth interviews, document review, and observation. OUr analysis revealed a number of challenges in producing useful metadata. First, creating and managing metadata required considerable effort and expertise, yet responsibility for these tasks was ill-defined and diffused among many individuals, leading to errors, failure to capture metadata, and uncertainty about the quality of the primary data. Second, metadata ended up stored in many different forms and software tools, making it hard to manage versions and transfer between formats. Third, the exact meanings of metadata categories remained unsettled and misunderstood even among a small community of domain experts -- an effect we expect to be exacerbated when scientists from other disciplines wish to use these data. In practice, we found that metadata problems due to these obstacles are often overcome through informal, personal communication, such as conversations or email. We conclude that metadata serve to communicate the context of data production from the people who produce data to those who wish to use it. Thus while formal metadata systems are often public, critical elements of metadata (those embodied in informal communication) may never be recorded. Therefore, efforts to increase data sharing should include ways to facilitate inter-investigator communication. Instead of tackling metadata challenges only on the formal level, we can improve data usability for broader communities by better supporting metadata communication.
ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data

NASA Astrophysics Data System (ADS)

Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.

2014-12-01

Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.
Syntactic and Semantic Validation without a Metadata Management System

NASA Technical Reports Server (NTRS)

Pollack, Janine; Gokey, Christopher D.; Kendig, David; Olsen, Lola; Wharton, Stephen W. (Technical Monitor)

2001-01-01

The ability to maintain quality information is essential to securing the confidence in any system for which the information serves as a data source. NASA's Global Change Master Directory (GCMD), an online Earth science data locator, holds over 9000 data set descriptions and is in a constant state of flux as metadata are created and updated on a daily basis. In such a system, the importance of maintaining the consistency and integrity of these-metadata is crucial. The GCMD has developed a metadata management system utilizing XML, controlled vocabulary, and Java technologies to ensure the metadata not only adhere to valid syntax, but also exhibit proper semantics.
Creating context for the experiment record. User-defined metadata: investigations into metadata usage in the LabTrove ELN.

PubMed

Willoughby, Cerys; Bird, Colin L; Coles, Simon J; Frey, Jeremy G

2014-12-22

The drive toward more transparency in research, the growing willingness to make data openly available, and the reuse of data to maximize the return on research investment all increase the importance of being able to find information and make links to the underlying data. The use of metadata in Electronic Laboratory Notebooks (ELNs) to curate experiment data is an essential ingredient for facilitating discovery. The University of Southampton has developed a Web browser-based ELN that enables users to add their own metadata to notebook entries. A survey of these notebooks was completed to assess user behavior and patterns of metadata usage within ELNs, while user perceptions and expectations were gathered through interviews and user-testing activities within the community. The findings indicate that while some groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users are making little attempts to use it, thereby endangering their ability to recover data in the future. A survey of patterns of metadata use in these notebooks, together with feedback from the user community, indicated that while a few groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users adopt a "minimum required" approach to metadata. To investigate whether the patterns of metadata use in LabTrove were unusual, a series of surveys were undertaken to investigate metadata usage in a variety of platforms supporting user-defined metadata. These surveys also provided the opportunity to investigate whether interface designs in these other environments might inform strategies for encouraging metadata creation and more effective use of metadata in LabTrove.
Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals

NASA Astrophysics Data System (ADS)

Zamyadi, A.; Pouliot, J.; Bédard, Y.

2013-09-01

Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial Data Infrastructure (CGDI) metadata which is an implementation of North American Profile of ISO-19115. The comparison analyzes the two metadata against three simulated scenarios about discovering needed 3D geo-spatial datasets. Considering specific metadata about 3D geospatial models, the proposed metadata-set has six additional classes on geometric dimension, level of detail, geometric modeling, topology, and appearance information. In addition classes on data acquisition, preparation, and modeling, and physical availability have been specialized for 3D geospatial models.
Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the Earthcube CINERGI Project

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Valentine, D. W., Jr.; Grethe, J. S.; Hsu, L.; Malik, T.; Bermudez, L. E.; Gupta, A.; Lehnert, K. A.; Whitenack, T.; Ozyurt, I. B.; Condit, C.; Calderon, R.; Musil, L.

2014-12-01

EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application. This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project's website (http://workspace.earthcube.org/cinergi), which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.
MeRy-B: a web knowledgebase for the storage, visualization, analysis and annotation of plant NMR metabolomic profiles

PubMed Central

2011-01-01

Background Improvements in the techniques for metabolomics analyses and growing interest in metabolomic approaches are resulting in the generation of increasing numbers of metabolomic profiles. Platforms are required for profile management, as a function of experimental design, and for metabolite identification, to facilitate the mining of the corresponding data. Various databases have been created, including organism-specific knowledgebases and analytical technique-specific spectral databases. However, there is currently no platform meeting the requirements for both profile management and metabolite identification for nuclear magnetic resonance (NMR) experiments. Description MeRy-B, the first platform for plant 1H-NMR metabolomic profiles, is designed (i) to provide a knowledgebase of curated plant profiles and metabolites obtained by NMR, together with the corresponding experimental and analytical metadata, (ii) for queries and visualization of the data, (iii) to discriminate between profiles with spectrum visualization tools and statistical analysis, (iv) to facilitate compound identification. It contains lists of plant metabolites and unknown compounds, with information about experimental conditions, the factors studied and metabolite concentrations for several plant species, compiled from more than one thousand annotated NMR profiles for various organs or tissues. Conclusion MeRy-B manages all the data generated by NMR-based plant metabolomics experiments, from description of the biological source to identification of the metabolites and determinations of their concentrations. It is the first database allowing the display and overlay of NMR metabolomic profiles selected through queries on data or metadata. MeRy-B is available from http://www.cbib.u-bordeaux2.fr/MERYB/index.php. PMID:21668943
Documentation Resources on the ESIP Wiki

NASA Technical Reports Server (NTRS)

Habermann, Ted; Kozimor, John; Gordon, Sean

2017-01-01

The ESIP community includes data providers and users that communicate with one another through datasets and metadata that describe them. Improving this communication depends on consistent high-quality metadata. The ESIP Documentation Cluster and the wiki play an important central role in facilitating this communication. We will describe and demonstrate sections of the wiki that provide information about metadata concept definitions, metadata recommendation, metadata dialects, and guidance pages. We will also describe and demonstrate the ISO Explorer, a tool that the community is developing to help metadata creators.
Harvesting NASA's Common Metadata Repository (CMR)

NASA Technical Reports Server (NTRS)

Shum, Dana; Durbin, Chris; Norton, James; Mitchell, Andrew

2017-01-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Harvesting NASA's Common Metadata Repository

NASA Astrophysics Data System (ADS)

Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

2017-12-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
The PDS4 Metadata Management System

NASA Astrophysics Data System (ADS)

Raugh, A. C.; Hughes, J. S.

2018-04-01

We present the key features of the Planetary Data System (PDS) PDS4 Information Model as an extendable metadata management system for planetary metadata related to data structure, analysis/interpretation, and provenance.
THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Shrestha, Biva; Palanisamy, Giri

Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, andmore » locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register their datasets to the NGEE data archive and allows the NGEE archive to publish these datasets via a data search portal (http://ngee.ornl.gov/data). These highly descriptive metadata created using OME allows the Archive to enable advanced data search options using keyword, geo-spatial, temporal and ontology filters. Similarly, ARM OME allows scientists or principal investigators (PIs) to submit their data products to the ARM data archive. How would OME help Big Data Centers like the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)? The ORNL DAAC is one of NASA s Earth Observing System Data and Information System (EOSDIS) data centers managed by the Earth Science Data and Information System (ESDIS) Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological, geological, and chemical components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the NGEE and ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability. Useful Links: USGS OME: http://mercury.ornl.gov/OME/ NGEE OME: http://ngee-arctic.ornl.gov/ngeemetadata/ ARM OME: http://archive2.ornl.gov/armome/ Contact: Ranjeet Devarakonda (devarakondar@ornl.gov) References: [1] Federal Geographic Data Committee. Content standard for digital geospatial metadata. Federal Geographic Data Committee, 1998. [2] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [3] Wilson, B. E., Palanisamy, G., Devarakonda, R., Rhyne, B. T., Lindsley, C., & Green, J. (2010). Mercury Toolset for Spatiotemporal Metadata. [4] Pouchard, L. C., Branstetter, M. L., Cook, R. B., Devarakonda, R., Green, J., Palanisamy, G., ... & Noy, N. F. (2013). A Linked Science investigation: enhancing climate change data discovery with semantic technologies. Earth science informatics, 6(3), 175-185.« less
An Approach to Information Management for AIR7000 with Metadata and Ontologies

DTIC Science & Technology

2009-10-01

metadata. We then propose an approach based on Semantic Technologies including the Resource Description Framework (RDF) and Upper Ontologies, for the...mandating specific metadata schemas can result in interoperability problems. For example, many standards within the ADO mandate the use of XML for metadata...such problems, we propose an archi- tecture in which different metadata schemes can inter operate. By using RDF (Resource Description Framework ) as a
20180318 - Using the US EPA's CompTox Chemistry Dashboard for structure identification and non-targeted analyses (ACS Spring)

EPA Science Inventory

The CompTox Chemistry Dashboard provides access to data for ~760,000 chemicals. High quality curated data and rich metadata facilitates mass spec analysis. “MS-Ready” processed data enables structure identification.
Data Publication Process for CMIP5 Data and the Role of PIDs within Federated Earth System Science Projects

NASA Astrophysics Data System (ADS)

Stockhause, M.; Höck, H.; Toussaint, F.; Weigel, T.; Lautenschlager, M.

2012-12-01

We present the publication process for the CMIP5 (Coupled Model Intercomparison Project Phase 5) data with special emphasis on the current role of identifiers and the potential future role of PIDs in such distributed technical infrastructures. The DataCite data publication with DOI assignment finalizes the 3 levels quality control procedure for CMIP5 data (Stockhause et al., 2012). WDCC utilizes the Assistant System Atarrabi to support the publication process. Atarrabi is a web-based workflow system for metadata reviews of data creators and Publication Agents (PAs). Within the quality checks for level 3 all available information in the different infrastructure components is cross-checked for consistency by the DataCite PA. This information includes: metadata on data, metadata in the long-term archive of the Publication Agency, quality information, and external metadata on model and simulation (CIM). For these consistency checks metadata related to the data publication has to be identified. The Data Reference Syntax (DRS) convention functions as global identifier for data. Since the DRS structures the data, hierarchically, it can be used to identify data collections like DataCite publication units, i.e. all data belonging to a CMIP5 simulation. Every technical component of the infrastructure uses DRS or maps to it, but there is no central repository storing DRS_ids. Thus they have to be mapped, occasionally. Additional local identifiers are used within the different technical infrastructure components. Identification of related pieces of information in their repositories is cumbersome and tricky for the PA. How could PIDs improve the situation? To establish a reliable distributed data and metadata infrastructure, PIDs for all objects are needed as well as relations between them. An ideal data publication scenario for federated community projects within Earth System Sciences, e.g. CMIP, would be: 1. Data creators at the modeling centers define their simulation, related metadata, and software, which are assigned PIDs. 2. During ESGF data publication the data entities are assigned PIDs with references to the PIDs of 1. Since we deal with different hierarchical levels, the definition of collections on these levels is advantageous. A possible implementation concept using Handles is described by Weigel et al. (2012). 3. Quality results are assigned PID(s) and a reference to the data. A quality PID is added as a reference to the data collection PID. 4. The PA accesses the PID on the data collection to get the data and all related information for cross-checking. The presented example of the technical infrastructure for the CMIP5 data distribution shows the importance of PIDs, especially as the data is distributed over multiple repositories world-wide and additional separate pieces of data related information are independently collected from the data. References: Stockhause, M., Höck, H., Toussaint, F., Lautenschlager, M. (2012): 'Quality assessment concept of the World Data Center for Climate and its application to CMIP5 data', Geosci. Model Dev. Discuss., 5, 781-802, doi:10.5194/gmdd-5-781-2012. Weigel, T., et al. (2012): 'Structural Elements in a Persistent Identifier Infrastructure and Resulting Benefits for the Earth Science Community', submitted to AGU 2012 Session IN009.
Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

NASA Technical Reports Server (NTRS)

Gilman, Jason; Shum, Dana; Baynes, Katie

2016-01-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.
Metadata to Describe Genomic Information.

PubMed

Delgado, Jaime; Naro, Daniel; Llorente, Silvia; Gelpí, Josep Lluís; Royo, Romina

2018-01-01

Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new format for efficient and secure compressed storage and transmission of genomic information.
Metadata for WIS and WIGOS: GAW Profile of ISO19115 and Draft WIGOS Core Metadata Standard

NASA Astrophysics Data System (ADS)

Klausen, Jörg; Howe, Brian

2014-05-01

The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority to underpin all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). The development of the WIGOS Operational Information Resource (WIR) is central to the WIGOS Framework Implementation Plan (WIGOS-IP). The WIR shall provide information on WIGOS and its observing components, as well as requirements of WMO application areas. An important aspect is the description of the observational capabilities by way of structured metadata. The Global Atmosphere Watch is the WMO program addressing the chemical composition and selected physical properties of the atmosphere. Observational data are collected and archived by GAW World Data Centres (WDCs) and related data centres. The Task Team on GAW WDCs (ET-WDC) have developed a profile of the ISO19115 metadata standard that is compliant with the WMO Information System (WIS) specification for the WMO Core Metadata Profile v1.3. This profile is intended to harmonize certain aspects of the documentation of observations as well as the interoperability of the WDCs. The Inter-Commission-Group on WIGOS (ICG-WIGOS) has established the Task Team on WIGOS Metadata (TT-WMD) with representation of all WMO Technical Commissions and the objective to define the WIGOS Core Metadata. The result of this effort is a draft semantic standard comprising of a set of metadata classes that are considered to be of critical importance for the interpretation of observations relevant to WIGOS. The purpose of the presentation is to acquaint the audience with the standard and to solicit informal feed-back from experts in the various disciplines of meteorology and climatology. This feed-back will help ET-WDC and TT-WMD to refine the GAW metadata profile and the draft WIGOS metadata standard, thereby increasing their utility and acceptance.
Taxonomic annotation of public fungal ITS sequences from the built environment – a report from an April 10–11, 2017 workshop (Aberdeen, UK)

PubMed Central

Nilsson, R. Henrik; Taylor, Andy F. S.; Adams, Rachel I.; Baschien, Christiane; Johan Bengtsson-Palme; Cangren, Patrik; Coleine, Claudia; Heide-Marie Daniel; Glassman, Sydney I.; Hirooka, Yuuri; Irinyi, Laszlo; Reda Iršėnaitė; Pedro M. Martin-Sanchez; Meyer, Wieland; Seung-Yoon Oh; Jose Paulo Sampaio; Seifert, Keith A.; Sklenář, Frantisek; Dirk Stubbe; Suh, Sung-Oui; Summerbell, Richard; Svantesson, Sten; Martin Unterseher; Cobus M. Visagie; Weiss, Michael; Woudenberg, Joyce HC; Christian Wurzbacher; den Wyngaert, Silke Van; Yilmaz, Neriman; Andrey Yurkov; Kõljalg, Urmas; Abarenkov, Kessy

2018-01-01

Abstract Recent DNA-based studies have shown that the built environment is surprisingly rich in fungi. These indoor fungi – whether transient visitors or more persistent residents – may hold clues to the rising levels of human allergies and other medical and building-related health problems observed globally. The taxonomic identity of these fungi is crucial in such pursuits. Molecular identification of the built mycobiome is no trivial undertaking, however, given the large number of unidentified, misidentified, and technically compromised fungal sequences in public sequence databases. In addition, the sequence metadata required to make informed taxonomic decisions – such as country and host/substrate of collection – are often lacking even from reference and ex-type sequences. Here we report on a taxonomic annotation workshop (April 10–11, 2017) organized at the James Hutton Institute/University of Aberdeen (UK) to facilitate reproducible studies of the built mycobiome. The 32 participants went through public fungal ITS barcode sequences related to the built mycobiome for taxonomic and nomenclatural correctness, technical quality, and metadata availability. A total of 19,508 changes – including 4,783 name changes, 14,121 metadata annotations, and the removal of 99 technically compromised sequences – were implemented in the UNITE database for molecular identification of fungi (https://unite.ut.ee/) and shared with a range of other databases and downstream resources. Among the genera that saw the largest number of changes were Penicillium, Talaromyces, Cladosporium, Acremonium, and Alternaria, all of them of significant importance in both culture-based and culture-independent surveys of the built environment. PMID:29559822
Metadata-Driven SOA-Based Application for Facilitation of Real-Time Data Warehousing

NASA Astrophysics Data System (ADS)

Pintar, Damir; Vranić, Mihaela; Skočir, Zoran

Service-oriented architecture (SOA) has already been widely recognized as an effective paradigm for achieving integration of diverse information systems. SOA-based applications can cross boundaries of platforms, operation systems and proprietary data standards, commonly through the usage of Web Services technology. On the other side, metadata is also commonly referred to as a potential integration tool given the fact that standardized metadata objects can provide useful information about specifics of unknown information systems with which one has interest in communicating with, using an approach commonly called "model-based integration". This paper presents the result of research regarding possible synergy between those two integration facilitators. This is accomplished with a vertical example of a metadata-driven SOA-based business process that provides ETL (Extraction, Transformation and Loading) and metadata services to a data warehousing system in need of a real-time ETL support.

Life+ EnvEurope DEIMS - improving access to long-term ecosystem monitoring data in Europe

NASA Astrophysics Data System (ADS)

Kliment, Tomas; Peterseil, Johannes; Oggioni, Alessandro; Pugnetti, Alessandra; Blankman, David

2013-04-01

Long-term ecological (LTER) studies aim at detecting environmental changes and analysing its related drivers. In this respect LTER Europe provides a network of about 450 sites and platforms. However, data on various types of ecosystems and at a broad geographical scale is still not easily available. Managing data resulting from long-term observations is therefore one of the important tasks not only for an LTER site itself but also on the network level. Exchanging and sharing the information within a wider community is a crucial objective in the upcoming years. Due to the fragmented nature of long-term ecological research and monitoring (LTER) in Europe - and also on the global scale - information management has to face several challenges: distributed data sources, heterogeneous data models, heterogeneous data management solutions and the complex domain of ecosystem monitoring with regard to the resulting data. The Life+ EnvEurope project (2010-2013) provides a case study for a workflow using data from the distributed network of LTER-Europe sites. In order to enhance discovery, evaluation and access to data, the EnvEurope Drupal Ecological Information Management System (DEIMS) has been developed. This is based on the first official release of the Drupal metadata editor developed by US LTER. EnvEurope DEIMS consists of three main components: 1) Metadata editor: a web-based client interface to manage metadata of three information resource types - datasets, persons and research sites. A metadata model describing datasets based on Ecological Metadata Language (EML) was developed within the initial phase of the project. A crosswalk to the INSPIRE metadata model was implemented to convey to the currently on-going European activities. Person and research site metadata models defined within the LTER Europe were adapted for the project needs. The three metadata models are interconnected within the system in order to provide easy way to navigate the user among the related resources. 2) Discovery client: provides several search profiles for datasets, persons, research sites and external resources commonly used in the domain, e.g. Catalogue of Life , based on several search patterns ranging from simple full text search, glossary browsing to categorized faceted search. 3) Geo-Viewer: a map client that portrays boundaries and centroids of the research sites as Web Map Service (WMS) layers. Each layer provides a link to both Metadata editor and Discovery client in order to create or discover metadata describing the data collected within the individual research site. Sharing of the dataset metadata with DEIMS is ensured in two ways: XML export of individual metadata records according to the EML schema for inclusion in the international DataOne network, and periodic harvesting of metadata into GeoNetwork catalogue, thus providing catalogue service for web (CSW), which can be invoked by remote clients. The final version of DEIMS will be a pilot implementation for the information system of LTER-Europe, which should establish a common information management framework within the European ecosystem research domain and provide valuable environmental information to other European information infrastructures as SEIS, Copernicus and INSPIRE.
Interactive Visualization Systems and Data Integration Methods for Supporting Discovery in Collections of Scientific Information

DTIC Science & Technology

2011-05-01

iTunes illustrate the difference between the centralized approach of digital library systems and the distributed approach of container file formats...metadata in a container file format. Apple’s iTunes uses a centralized metadata approach and allows users to maintain song metadata in a single...one iTunes library to another the metadata must be copied separately or reentered in the new library. This demonstrates the utility of storing metadata
Metadata Wizard: an easy-to-use tool for creating FGDC-CSDGM metadata for geospatial datasets in ESRI ArcGIS Desktop

USGS Publications Warehouse

Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.

2014-01-01

Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).
Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

NASA Astrophysics Data System (ADS)

Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

2017-12-01

The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.
Evolutions in Metadata Quality

NASA Astrophysics Data System (ADS)

Gilman, J.

2016-12-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This talk will cover how we encourage metadata authors to improve the metadata through the use of integrated rubrics of metadata quality and outreach efforts. In addition we'll demonstrate Humanizers, a technique for dealing with the symptoms of metadata issues. Humanizers allow CMR administrators to identify specific metadata issues that are fixed at runtime when the data is indexed. An example Humanizer is the aliasing of processing level "Level 1" to "1" to improve consistency across collections. The CMR currently indexes 35K collections and 300M granules.
Generating Researcher Networks with Identified Persons on a Semantic Service Platform

NASA Astrophysics Data System (ADS)

Jung, Hanmin; Lee, Mikyoung; Kim, Pyung; Lee, Seungwoo

This paper describes a Semantic Web-based method to acquire researcher networks by means of identification scheme, ontology, and reasoning. Three steps are required to realize it; resolving co-references, finding experts, and generating researcher networks. We adopt OntoFrame as an underlying semantic service platform and apply reasoning to make direct relations between far-off classes in ontology schema. 453,124 Elsevier journal articles with metadata and full-text documents in information technology and biomedical domains have been loaded and served on the platform as a test set.
Describing environmental public health data: implementing a descriptive metadata standard on the environmental public health tracking network.

PubMed

Patridge, Jeff; Namulanda, Gonza

2008-01-01

The Environmental Public Health Tracking (EPHT) Network provides an opportunity to bring together diverse environmental and health effects data by integrating}?> local, state, and national databases of environmental hazards, environmental exposures, and health effects. To help users locate data on the EPHT Network, the network will utilize descriptive metadata that provide critical information as to the purpose, location, content, and source of these data. Since 2003, the Centers for Disease Control and Prevention's EPHT Metadata Subgroup has been working to initiate the creation and use of descriptive metadata. Efforts undertaken by the group include the adoption of a metadata standard, creation of an EPHT-specific metadata profile, development of an open-source metadata creation tool, and promotion of the creation of descriptive metadata by changing the perception of metadata in the public health culture.
Applying a Data Stewardship Maturity Matrix to the NOAA Observing System Portfolio Integrated Assessment Process

NASA Astrophysics Data System (ADS)

Peng, G.; Austin, M.

2017-12-01

Identification and prioritization of targeted user community needs are not always considered until after data has been created and archived. Gaps in data curation and documentation in the data production and delivery phases limit data's broad utility specifically for decision makers. Expert understanding and knowledge of a particular dataset is often required as a part of the data and metadata curation process to establish the credibility of the data and support informed decision-making. To enhance curation practices, content from NOAA's Observing System Integrated Assessment (NOSIA) Value Tree, NOAA's Data Catalog/Digital Object Identifier (DOI) projects (collection-level metadata) have been integrated with Data/Stewardship Maturity Matrices (data and stewardship quality information) focused on assessment of user community needs. This results in user focused evidence based decision making tools created by NOAA's National Environmental Satellite, Data, and Information Service (NESDIS) through identification and assessment of data content gaps related to scientific knowledge and application to key areas of societal benefit. Through enabling user need feedback from the beginning of data creation through archive allows users to determine the quality and value of data that is fit for purpose. Data gap assessment and prioritization are presented in a user-friendly way using the data stewardship maturity matrices as measurement of data management quality. These decision maker tools encourages data producers and data providers/stewards to consider users' needs prior to data creation and dissemination resulting in user driven data requirements increasing return on investment. A use case focused on need for NOAA observations linked societal benefit will be used to demonstrate the value of these tools.
Making Interoperability Easier with NASA's Metadata Management Tool (MMT)

NASA Technical Reports Server (NTRS)

Shum, Dana; Reese, Mark; Pilone, Dan; Baynes, Katie

2016-01-01

While the ISO-19115 collection level metadata format meets many users' needs for interoperable metadata, it can be cumbersome to create it correctly. Through the MMT's simple UI experience, metadata curators can create and edit collections which are compliant with ISO-19115 without full knowledge of the NASA Best Practices implementation of ISO-19115 format. Users are guided through the metadata creation process through a forms-based editor, complete with field information, validation hints and picklists. Once a record is completed, users can download the metadata in any of the supported formats with just 2 clicks.
Content Metadata Standards for Marine Science: A Case Study

USGS Publications Warehouse

Riall, Rebecca L.; Marincioni, Fausto; Lightsom, Frances L.

2004-01-01

The U.S. Geological Survey developed a content metadata standard to meet the demands of organizing electronic resources in the marine sciences for a broad, heterogeneous audience. These metadata standards are used by the Marine Realms Information Bank project, a Web-based public distributed library of marine science from academic institutions and government agencies. The development and deployment of this metadata standard serve as a model, complete with lessons about mistakes, for the creation of similarly specialized metadata standards for digital libraries.
Simplified Metadata Curation via the Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Pilone, D.

2015-12-01

The Metadata Management Tool (MMT) is the newest capability developed as part of NASA Earth Observing System Data and Information System's (EOSDIS) efforts to simplify metadata creation and improve metadata quality. The MMT was developed via an agile methodology, taking into account inputs from GCMD's science coordinators and other end-users. In its initial release, the MMT uses the Unified Metadata Model for Collections (UMM-C) to allow metadata providers to easily create and update collection records in the ISO-19115 format. Through a simplified UI experience, metadata curators can create and edit collections without full knowledge of the NASA Best Practices implementation of ISO-19115 format, while still generating compliant metadata. More experienced users are also able to access raw metadata to build more complex records as needed. In future releases, the MMT will build upon recent work done in the community to assess metadata quality and compliance with a variety of standards through application of metadata rubrics. The tool will provide users with clear guidance as to how to easily change their metadata in order to improve their quality and compliance. Through these features, the MMT allows data providers to create and maintain compliant and high quality metadata in a short amount of time.
HealthCyberMap: a semantic visual browser of medical Internet resources based on clinical codes and the human body metaphor.

PubMed

Kamel Boulos, Maged N; Roudsari, Abdul V; Carso N, Ewart R

2002-12-01

HealthCyberMap (HCM-http://healthcybermap.semanticweb.org) is a web-based service for healthcare professionals and librarians, patients and the public in general that aims at mapping parts of the health information resources in cyberspace in novel ways to improve their retrieval and navigation. HCM adopts a clinical metadata framework built upon a clinical coding ontology for the semantic indexing, classification and browsing of Internet health information resources. A resource metadata base holds information about selected resources. HCM then uses GIS (Geographic Information Systems) spatialization methods to generate interactive navigational cybermaps from the metadata base. These visual cybermaps are based on familiar medical metaphors. HCM cybermaps can be considered as semantically spatialized, ontology-based browsing views of the underlying resource metadata base. Using a clinical coding scheme as a metric for spatialization ('semantic distance') is unique to HCM and is very much suited for the semantic categorization and navigation of Internet health information resources. Clinical codes ensure reliable and unambiguous topical indexing of these resources. HCM also introduces a useful form of cyberspatial analysis for the detection of topical coverage gaps in the resource metadata base using choropleth (shaded) maps of human body systems.
Constructing a Cross-Domain Resource Inventory: Key Components and Results of the EarthCube CINERGI Project.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Malik, T.; Hsu, L.; Gupta, A.; Grethe, J. S.; Valentine, D. W., Jr.; Lehnert, K. A.; Bermudez, L. E.; Ozyurt, I. B.; Whitenack, T.; Schachne, A.; Giliarini, A.

2015-12-01

While many geoscience-related repositories and data discovery portals exist, finding information about available resources remains a pervasive problem, especially when searching across multiple domains and catalogs. Inconsistent and incomplete metadata descriptions, disparate access protocols and semantic differences across domains, and troves of unstructured or poorly structured information which is hard to discover and use are major hindrances toward discovery, while metadata compilation and curation remain manual and time-consuming. We report on methodology, main results and lessons learned from an ongoing effort to develop a geoscience-wide catalog of information resources, with consistent metadata descriptions, traceable provenance, and automated metadata enhancement. Developing such a catalog is the central goal of CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability), an EarthCube building block project (earthcube.org/group/cinergi). The key novel technical contributions of the projects include: a) development of a metadata enhancement pipeline and a set of document enhancers to automatically improve various aspects of metadata descriptions, including keyword assignment and definition of spatial extents; b) Community Resource Viewers: online applications for crowdsourcing community resource registry development, curation and search, and channeling metadata to the unified CINERGI inventory, c) metadata provenance, validation and annotation services, d) user interfaces for advanced resource discovery; and e) geoscience-wide ontology and machine learning to support automated semantic tagging and faceted search across domains. We demonstrate these CINERGI components in three types of user scenarios: (1) improving existing metadata descriptions maintained by government and academic data facilities, (2) supporting work of several EarthCube Research Coordination Network projects in assembling information resources for their domains, and (3) enhancing the inventory and the underlying ontology to address several complicated data discovery use cases in hydrology, geochemistry, sedimentology, and critical zone science. Support from the US National Science Foundation under award ICER-1343816 is gratefully acknowledged.
77 FR 22707 - Electronic Reporting Under the Toxic Substances Control Act

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-17

... completes metadata information, the web-based tool validates the submission by performing a basic error... uploading PDF attachments or other file types, such as XML, and completing metadata information would be...
openPDS: protecting the privacy of metadata through SafeAnswers.

PubMed

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.
openPDS: Protecting the Privacy of Metadata through SafeAnswers

PubMed Central

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S.; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research. PMID:25007320
A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

NASA Astrophysics Data System (ADS)

Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

2013-12-01

The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
Transformation of HDF-EOS metadata from the ECS model to ISO 19115-based XML

NASA Astrophysics Data System (ADS)

Wei, Yaxing; Di, Liping; Zhao, Baohua; Liao, Guangxuan; Chen, Aijun

2007-02-01

Nowadays, geographic data, such as NASA's Earth Observation System (EOS) data, are playing an increasing role in many areas, including academic research, government decisions and even in people's every lives. As the quantity of geographic data becomes increasingly large, a major problem is how to fully make use of such data in a distributed, heterogeneous network environment. In order for a user to effectively discover and retrieve the specific information that is useful, the geographic metadata should be described and managed properly. Fortunately, the emergence of XML and Web Services technologies greatly promotes information distribution across the Internet. The research effort discussed in this paper presents a method and its implementation for transforming Hierarchical Data Format (HDF)-EOS metadata from the NASA ECS model to ISO 19115-based XML, which will be managed by the Open Geospatial Consortium (OGC) Catalogue Services—Web Profile (CSW). Using XML and international standards rather than domain-specific models to describe the metadata of those HDF-EOS data, and further using CSW to manage the metadata, can allow metadata information to be searched and interchanged more widely and easily, thus promoting the sharing of HDF-EOS data.
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort

NASA Astrophysics Data System (ADS)

Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.

2017-12-01

Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
Metadata: Standards for Retrieving WWW Documents (and Other Digitized and Non-Digitized Resources)

NASA Astrophysics Data System (ADS)

Rusch-Feja, Diann

The use of metadata for indexing digitized and non-digitized resources for resource discovery in a networked environment is being increasingly implemented all over the world. Greater precision is achieved using metadata than relying on universal search engines and furthermore, meta-data can be used as filtering mechanisms for search results. An overview of various metadata sets is given, followed by a more focussed presentation of Dublin Core Metadata including examples of sub-elements and qualifiers. Especially the use of the Dublin Core Relation element provides connections between the metadata of various related electronic resources, as well as the metadata for physical, non-digitized resources. This facilitates more comprehensive search results without losing precision and brings together different genres of information which would otherwise be only searchable in separate databases. Furthermore, the advantages of Dublin Core Metadata in comparison with library cataloging and the use of universal search engines are discussed briefly, followed by a listing of types of implementation of Dublin Core Metadata.

75 FR 4689 - Electronic Tariff Filings

Federal Register 2010, 2011, 2012, 2013, 2014

2010-01-29

... collaborative process relies upon the use of metadata (or information) about the tariff filing, including such... code.\\5\\ Because the Commission is using the electronic metadata to establish statutory action dates... code, as well as accurately providing any other metadata. 6. Similarly, the Commission will be using...
Metadata Sets for e-Government Resources: The Extended e-Government Metadata Schema (eGMS+)

NASA Astrophysics Data System (ADS)

Charalabidis, Yannis; Lampathaki, Fenareti; Askounis, Dimitris

In the dawn of the Semantic Web era, metadata appear as a key enabler that assists management of the e-Government resources related to the provision of personalized, efficient and proactive services oriented towards the real citizens’ needs. As different authorities typically use different terms to describe their resources and publish them in various e-Government Registries that may enhance the access to and delivery of governmental knowledge, but also need to communicate seamlessly at a national and pan-European level, the need for a unified e-Government metadata standard emerges. This paper presents the creation of an ontology-based extended metadata set for e-Government Resources that embraces services, documents, XML Schemas, code lists, public bodies and information systems. Such a metadata set formalizes the exchange of information between portals and registries and assists the service transformation and simplification efforts, while it can be further taken into consideration when applying Web 2.0 techniques in e-Government.
The Importance of Metadata in System Development and IKM

DTIC Science & Technology

2003-02-01

Defence R& D Canada The Importance of Metadata in System Development and IKM Anthony W. Isenor Technical Memorandum DRDC Atlantic TM 2003-011...Metadata in System Development and IKM Anthony W. Isenor Defence R& D Canada – Atlantic Technical Memorandum DRDC Atlantic TM 2003-011 February... it is important for searches and providing relevant information to the client. A comparison of metadata standards was conducted with emphasis on
Studies of Big Data metadata segmentation between relational and non-relational databases

NASA Astrophysics Data System (ADS)

Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

2015-12-01

In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.
Cross-organizational workflow in radiology: an empirical study of the quality of shared metadata elements in Region Västra Götaland, Sweden.

PubMed

Lindsköld, Lars; Wintell, Mikael; Edgren, Lars; Aspelin, Peter; Lundberg, Nina

2013-07-01

Challenges related to the cross-organizational access of accurate and timely information about a patient's condition has become a critical issue in healthcare. Interoperability of different local sources is necessary. To identify and present missing and semantically incorrect data elements of metadata in the radiology enterprise service that supports cross-organizational sharing of dynamic information about patients' visits, in the Region Västra Götaland, Sweden. Quantitative data elements of metadata were collected yearly from the first Wednesday in March from 2006 to 2011 from the 24 in-house radiology departments in Region Västra Götaland. These radiology departments were organized into four hospital groups and three stand-alone hospitals. Included data elements of metadata were the patient name, patient ID, institutional department name, referring physician's name, and examination description. The majority of missing data elements of metadata was related to the institutional department name for Hospital 2, from 87% in 2007 to 25% in 2011. All data elements of metadata except the patient ID contained semantic errors. For example, for the data element "patient name", only three names out of 3537 were semantically correct. This study shows that the semantics of metadata elements are poorly structured and inconsistently used. Although a cross-organizational solution may technically be fully functional, semantic errors may prevent it from serving as an information infrastructure for collaboration between all departments and hospitals in the region. For interoperability, it is important that the agreed semantic models are implemented in vendor systems using the information infrastructure.
Organizing Scientific Data Sets: Studying Similarities and Differences in Metadata and Subject Term Creation

ERIC Educational Resources Information Center

White, Hollie C.

2012-01-01

Background: According to Salo (2010), the metadata entered into repositories are "disorganized" and metadata schemes underlying repositories are "arcane". This creates a challenging repository environment in regards to personal information management (PIM) and knowledge organization systems (KOSs). This dissertation research is…
Evolving Metadata in NASA Earth Science Data Systems

NASA Astrophysics Data System (ADS)

Mitchell, A.; Cechini, M. F.; Walter, J.

2011-12-01

NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of NASA's effort to continually evolve its data systems led ECHO to enhancing the method in which it receives inventory metadata from the data centers to allow for multiple metadata formats including ISO 19115. ECHO's metadata model will also be mapped to the NASA-specific convention for ingesting science metadata into the ECHO system. As NASA's new Earth Science missions and data centers are migrating to the ISO 19115 standards, EOSDIS is developing metadata management resources to assist in the reading, writing and parsing ISO 19115 compliant metadata. To foster interoperability with other agencies and international partners, NASA is working to ensure that a common ISO 19115 convention is developed, enhancing data sharing capabilities and other data analysis initiatives. NASA is also investigating the use of ISO 19115 standards to encode data quality, lineage and provenance with stored values. A common metadata standard across NASA's Earth Science data systems promotes interoperability, enhances data utilization and removes levels of uncertainty found in data products.
CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

NASA Astrophysics Data System (ADS)

Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

2011-12-01

JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web-based interface by a metadata editor in CMO as needed. Then daily differential uptake of metadata from the XML database to databases in several distribution websites is automatically processed using a convertor defined by the EAI software. Currently, CMO is available for three distribution websites: "Deep Sea Floor Rock Sample Database GANSEKI", "Marine Biological Sample Database", and "JAMSTEC E-library of Deep-sea Images". CMO is planned to provide "JAMSTEC Data Site for Research Cruises" with metadata in the future.
Digital Initiatives and Metadata Use in Thailand

ERIC Educational Resources Information Center

SuKantarat, Wichada

2008-01-01

Purpose: This paper aims to provide information about various digital initiatives in libraries in Thailand and especially use of Dublin Core metadata in cataloguing digitized objects in academic and government digital databases. Design/methodology/approach: The author began researching metadata use in Thailand in 2003 and 2004 while on sabbatical…
The Role of Metadata Standards in EOSDIS Search and Retrieval Applications

NASA Technical Reports Server (NTRS)

Pfister, Robin

1999-01-01

Metadata standards play a critical role in data search and retrieval systems. Metadata tie software to data so the data can be processed, stored, searched, retrieved and distributed. Without metadata these actions are not possible. The process of populating metadata to describe science data is an important service to the end user community so that a user who is unfamiliar with the data, can easily find and learn about a particular dataset before an order decision is made. Once a good set of standards are in place, the accuracy with which data search can be performed depends on the degree to which metadata standards are adhered during product definition. NASA's Earth Observing System Data and Information System (EOSDIS) provides examples of how metadata standards are used in data search and retrieval.
A model for enhancing Internet medical document retrieval with "medical core metadata".

PubMed

Malet, G; Munoz, F; Appleyard, R; Hersh, W

1999-01-01

Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
WGISS-45 International Directory Network (IDN) Report

NASA Technical Reports Server (NTRS)

Morahan, Michael

2018-01-01

The objective of this presentation is to provide IDN (International Directory Network) updates on features and activities to the Committee on Earth Observation Satellites (CEOS) Working Group on Information Systems and Services (WGISS) and provider community. The following topics will be will be discussed during the presentation: Transition of Providers DIF-9 (Directory Interchange Format-9) to DIF-10 Metadata Records in the Common Metadata Repository (CMR); GCMD (Global Change Master Directory) Keyword Update; DIF-10 and UMM-C (Unified Metadata Model-Collections) Schema Changes; Metadata Validation of Provider Metadata; docBUILDER for Submitting IDN Metadata to the CMR (i.e. Registration); and Mapping WGClimate Essential Climate Variable (ECV) Inventory to IDN Records.
Concept for Future Data Services at the Long-Term Archive of WDCC combining DOIs with common PIDs

NASA Astrophysics Data System (ADS)

Stockhause, Martina; Weigel, Tobias; Toussaint, Frank; Höck, Heinke; Thiemann, Hannes; Lautenschlager, Michael

2013-04-01

The World Data Center for Climate (WDCC) hosted at the German Climate Computing Center (DKRZ) maintains a long-term archive (LTA) of climate model data as well as observational data. WDCC distinguishes between two types of LTA data: Structured data: Data output of an instrument or of a climate model run consists of numerous, highly structured individual datasets in a uniform format. Part of these data is also published on an ESGF (Earth System Grid Federation) data node. Detailed metadata is available allowing for fine-grained user-defined data access. Unstructured data: LTA data of finished scientific projects are in general unstructured and consist of datasets of different formats, different sizes, and different contents. For these data compact metadata is available as content information. The structured data is suitable for WDCC's DataCite DOI process, the project data only in exceptional cases. The DOI process includes a thorough quality control process of technical as well as scientific aspects by the publication agent and the data creator. DOIs are assigned to data collections appropriate to be cited in scientific publications, like a simulation run. The data collection is defined in agreement with the data creator. At the moment there is no possibility to identify and cite individual datasets within this DOI data collection analogous to the citation of chapters in a book. Also missing is a compact citation regulation for a user-specified collection of data. WDCC therefore complements its existing LTA/DOI concept by Persistent Identifier (PID) assignment to datasets using Handles. In addition to data identification for internal and external use, the concept of PIDs allows to define relations among PIDs. Such structural information is stored as key-value pair directly in the handles. Thus, relations provide basic provenance or lineage information, even if part of the data like intermediate results are lost. WDCC intends to use additional PIDs on metadata entities with a relation to the data PID(s). These add background information on the data creation process (e.g. descriptions of experiment, model, model set-up, and platform for the model run etc.) to the data. These pieces of additional information increase the re-usability of the archived model data, significantly. Other valuable additional information for scientific collaboration could be added by the same mechanism, like quality information and annotations. Apart from relations among data and metadata entities, PIDs on collections are advantageous for model data: Collections allow for persistent references to single datasets or subsets of data assigned a DOI, Data objects and additional information objects can be consistently connected via relations (provenance, creation, quality information for data),
A novel framework for assessing metadata quality in epidemiological and public health research settings

PubMed Central

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly. PMID:27570670
A novel framework for assessing metadata quality in epidemiological and public health research settings.

PubMed

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.
Introducing the PRIDE Archive RESTful web services.

PubMed

Reisinger, Florian; del-Toro, Noemi; Ternent, Tobias; Hermjakob, Henning; Vizcaíno, Juan Antonio

2015-07-01

The PRIDE (PRoteomics IDEntifications) database is one of the world-leading public repositories of mass spectrometry (MS)-based proteomics data and it is a founding member of the ProteomeXchange Consortium of proteomics resources. In the original PRIDE database system, users could access data programmatically by accessing the web services provided by the PRIDE BioMart interface. New REST (REpresentational State Transfer) web services have been developed to serve the most popular functionality provided by BioMart (now discontinued due to data scalability issues) and address the data access requirements of the newly developed PRIDE Archive. Using the API (Application Programming Interface) it is now possible to programmatically query for and retrieve peptide and protein identifications, project and assay metadata and the originally submitted files. Searching and filtering is also possible by metadata information, such as sample details (e.g. species and tissues), instrumentation (mass spectrometer), keywords and other provided annotations. The PRIDE Archive web services were first made available in April 2014. The API has already been adopted by a few applications and standalone tools such as PeptideShaker, PRIDE Inspector, the Unipept web application and the Python-based BioServices package. This application is free and open to all users with no login requirement and can be accessed at http://www.ebi.ac.uk/pride/ws/archive/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
USGIN ISO metadata profile

NASA Astrophysics Data System (ADS)

Richard, S. M.

2011-12-01

The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata content. The use cases for the detailed content must be well understood, and the degree of metadata complexity should be determined by requirements for those use cases. The ISO standard provides sufficient flexibility that relatively simple metadata records can be created that will serve for text-indexed search/discovery, resource evaluation by a user reading text content from the metadata, and access to the resource via http, ftp, or well-known service protocols (e.g. Thredds; OGC WMS, WFS, WCS).
The health care and life sciences community profile for dataset descriptions

PubMed Central

Alexiev, Vladimir; Ansell, Peter; Bader, Gary; Baran, Joachim; Bolleman, Jerven T.; Callahan, Alison; Cruz-Toledo, José; Gaudet, Pascale; Gombocz, Erich A.; Gonzalez-Beltran, Alejandra N.; Groth, Paul; Haendel, Melissa; Ito, Maori; Jupp, Simon; Juty, Nick; Katayama, Toshiaki; Kobayashi, Norio; Krishnaswami, Kalpana; Laibe, Camille; Le Novère, Nicolas; Lin, Simon; Malone, James; Miller, Michael; Mungall, Christopher J.; Rietveld, Laurens; Wimalaratne, Sarala M.; Yamaguchi, Atsuko

2016-01-01

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets. PMID:27602295
Sensor metadata blueprints and computer-aided editing for disciplined SensorML

NASA Astrophysics Data System (ADS)

Tagliolato, Paolo; Oggioni, Alessandro; Fugazza, Cristiano; Pepe, Monica; Carrara, Paola

2016-04-01

The need for continuous, accurate, and comprehensive environmental knowledge has led to an increase in sensor observation systems and networks. The Sensor Web Enablement (SWE) initiative has been promoted by the Open Geospatial Consortium (OGC) to foster interoperability among sensor systems. The provision of metadata according to the prescribed SensorML schema is a key component for achieving this and nevertheless availability of correct and exhaustive metadata cannot be taken for granted. On the one hand, it is awkward for users to provide sensor metadata because of the lack in user-oriented, dedicated tools. On the other, the specification of invariant information for a given sensor category or model (e.g., observed properties and units of measurement, manufacturer information, etc.), can be labor- and timeconsuming. Moreover, the provision of these details is error prone and subjective, i.e., may differ greatly across distinct descriptions for the same system. We provide a user-friendly, template-driven metadata authoring tool composed of a backend web service and an HTML5/javascript client. This results in a form-based user interface that conceals the high complexity of the underlying format. This tool also allows for plugging in external data sources providing authoritative definitions for the aforementioned invariant information. Leveraging these functionalities, we compiled a set of SensorML profiles, that is, sensor metadata blueprints allowing end users to focus only on the metadata items that are related to their specific deployment. The natural extension of this scenario is the involvement of end users and sensor manufacturers in the crowd-sourced evolution of this collection of prototypes. We describe the components and workflow of our framework for computer-aided management of sensor metadata.
77 FR 33739 - Announcement of Requirements and Registration for “Health Data Platform Metadata Challenge”

Federal Register 2010, 2011, 2012, 2013, 2014

2012-06-07

... Information Technology. SUMMARY: As part of the HHS Open Government Plan, the HealthData.gov Platform (HDP) is... application of existing voluntary consensus standards for metadata common to all open government data, and... vocabulary recommendations for Linked Data publishers, defining cross domain semantic metadata of open...

To Teach or Not to Teach: The Ethics of Metadata

ERIC Educational Resources Information Center

Barnes, Cynthia; Cavaliere, Frank

2009-01-01

Metadata is information about computer-generated documents that is often inadvertently transmitted to others. The problems associated with metadata have become more acute over time as word processing and other popular programs have become more receptive to the concept of collaboration. As more people become involved in the preparation of…
Creating FGDC and NBII metadata with Metavist 2005.

Treesearch

David J. Rugg

2004-01-01

This report documents a computer program for creating metadata compliant with the Federal Geographic Data Committee (FGDC) 1998 metadata standard or the National Biological Information Infrastructure (NBII) 1999 Biological Data Profile for the FGDC standard. The software runs under the Microsoft Windows 2000 and XP operating systems, and requires the presence of...
Evaluating and Evolving Metadata in Multiple Dialects

NASA Astrophysics Data System (ADS)

Kozimor, J.; Habermann, T.; Powers, L. A.; Gordon, S.

2016-12-01

Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways. This situation has led to the development of metadata repositories that can ingest and output metadata in multiple dialects. As an operational example, the NASA Common Metadata Repository (CMR) includes three different metadata dialects (DIF, ECHO, and ISO 19115-2). These systems raise a new question for metadata providers: if I have a choice of metadata dialects, which should I use and how do I make that decision? We have developed a collection of metadata evaluation tools that can be used to evaluate metadata records in many dialects for completeness with respect to recommendations from many organizations and communities. We have applied these tools to over 8000 collection and granule metadata records in four different dialects. This large collection of identical content in multiple dialects enables us to address questions about metadata and dialect evolution and to answer those questions quantitatively. We will describe those tools and results from evaluating the NASA CMR metadata collection.
An Overview of Tools for Creating, Validating and Using PDS Metadata

NASA Astrophysics Data System (ADS)

King, T. A.; Hardman, S. H.; Padams, J.; Mafi, J. N.; Cecconi, B.

2017-12-01

NASA's Planetary Data System (PDS) has defined information models for creating metadata to describe bundles, collections and products for all the assets acquired by a planetary science projects. Version 3 of the PDS Information Model (commonly known as "PDS3") is widely used and is used to describe most of the existing planetary archive. Recently PDS has released version 4 of the Information Model (commonly known as "PDS4") which is designed to improve consistency, efficiency and discoverability of information. To aid in creating, validating and using PDS4 metadata the PDS and a few associated groups have developed a variety of tools. In addition, some commercial tools, both free and for a fee, can be used to create and work with PDS4 metadata. We present an overview of these tools, describe those tools currently under development and provide guidance as to which tools may be most useful for missions, instrument teams and the individual researcher.
Global land information system (GLIS) access to worldwide Landsat data

USGS Publications Warehouse

Smith, Timothy B.; Goodale, Katherine L.

1993-01-01

The Landsat Technical Working Group (LTWG) and the Landsat Ground Station Operations Working Group (LGSOWG) have encouraged Landsat receiving stations around the world to share information about their data holdings through the exchange of metadata records. Receiving stations forward their metadata records to the U.S. Geological Survey's EROS Data Center (EDC) on a quarterly basis. The EDC maintains the records for each station, coordinates changes to the database, and provides metadata to the stations as requested. The result is a comprehensive international database listing most of the world's Landsat data acquisitions This exchange of information began in the early 1980's with the inclusion in the EDC database os scenes acquired by a receiving station in Italy. Through the years other stations have agreed to participate; currently ten of the seventeen stations actively share their metadata records. Coverage maps have been generated to depict the status of the database. The Worldwide Landsat database is also available though the Global Land Information System (GLIS).
The STP (Solar-Terrestrial Physics) Semantic Web based on the RSS1.0 and the RDF

NASA Astrophysics Data System (ADS)

Kubo, T.; Murata, K. T.; Kimura, E.; Ishikura, S.; Shinohara, I.; Kasaba, Y.; Watari, S.; Matsuoka, D.

2006-12-01

In the Solar-Terrestrial Physics (STP), it is pointed out that circulation and utilization of observation data among researchers are insufficient. To archive interdisciplinary researches, we need to overcome this circulation and utilization problems. Under such a background, authors' group has developed a world-wide database that manages meta-data of satellite and ground-based observation data files. It is noted that retrieving meta-data from the observation data and registering them to database have been carried out by hand so far. Our goal is to establish the STP Semantic Web. The Semantic Web provides a common framework that allows a variety of data shared and reused across applications, enterprises, and communities. We also expect that the secondary information related with observations, such as event information and associated news, are also shared over the networks. The most fundamental issue on the establishment is who generates, manages and provides meta-data in the Semantic Web. We developed an automatic meta-data collection system for the observation data using the RSS (RDF Site Summary) 1.0. The RSS1.0 is one of the XML-based markup languages based on the RDF (Resource Description Framework), which is designed for syndicating news and contents of news-like sites. The RSS1.0 is used to describe the STP meta-data, such as data file name, file server address and observation date. To describe the meta-data of the STP beyond RSS1.0 vocabulary, we defined original vocabularies for the STP resources using the RDF Schema. The RDF describes technical terms on the STP along with the Dublin Core Metadata Element Set, which is standard for cross-domain information resource descriptions. Researchers' information on the STP by FOAF, which is known as an RDF/XML vocabulary, creates a machine-readable metadata describing people. Using the RSS1.0 as a meta-data distribution method, the workflow from retrieving meta-data to registering them into the database is automated. This technique is applied for several database systems, such as the DARTS database system and NICT Space Weather Report Service. The DARTS is a science database managed by ISAS/JAXA in Japan. We succeeded in generating and collecting the meta-data automatically for the CDF (Common data Format) data, such as Reimei satellite data, provided by the DARTS. We also create an RDF service for space weather report and real-time global MHD simulation 3D data provided by the NICT. Our Semantic Web system works as follows: The RSS1.0 documents generated on the data sites (ISAS and NICT) are automatically collected by a meta-data collection agent. The RDF documents are registered and the agent extracts meta-data to store them in the Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying. The RDF database provides advanced retrieval processing that has considered property and relation. Finally, the STP Semantic Web provides automatic processing or high level search for the data which are not only for observation data but for space weather news, physical events, technical terms and researches information related to the STP.
Making Interoperability Easier with the NASA Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.

2016-12-01

ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata.

PubMed

Hitz, Benjamin C; Rowe, Laurence D; Podduturi, Nikhil R; Glick, David I; Baymuradov, Ulugbek K; Malladi, Venkat S; Chan, Esther T; Davidson, Jean M; Gabdank, Idan; Narayana, Aditi K; Onate, Kathrina C; Hilton, Jason; Ho, Marcus C; Lee, Brian T; Miyasato, Stuart R; Dreszer, Timothy R; Sloan, Cricket A; Strattan, J Seth; Tanaka, Forrest Y; Hong, Eurie L; Cherry, J Michael

2017-01-01

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package.
SnoVault and encodeD: A novel object-based storage system and applications to ENCODE metadata

PubMed Central

Podduturi, Nikhil R.; Glick, David I.; Baymuradov, Ulugbek K.; Malladi, Venkat S.; Chan, Esther T.; Davidson, Jean M.; Gabdank, Idan; Narayana, Aditi K.; Onate, Kathrina C.; Hilton, Jason; Ho, Marcus C.; Lee, Brian T.; Miyasato, Stuart R.; Dreszer, Timothy R.; Sloan, Cricket A.; Strattan, J. Seth; Tanaka, Forrest Y.; Hong, Eurie L.; Cherry, J. Michael

2017-01-01

The Encyclopedia of DNA elements (ENCODE) project is an ongoing collaborative effort to create a comprehensive catalog of functional elements initiated shortly after the completion of the Human Genome Project. The current database exceeds 6500 experiments across more than 450 cell lines and tissues using a wide array of experimental techniques to study the chromatin structure, regulatory and transcriptional landscape of the H. sapiens and M. musculus genomes. All ENCODE experimental data, metadata, and associated computational analyses are submitted to the ENCODE Data Coordination Center (DCC) for validation, tracking, storage, unified processing, and distribution to community resources and the scientific community. As the volume of data increases, the identification and organization of experimental details becomes increasingly intricate and demands careful curation. The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata. The software is fully open-source, code and installation instructions can be found at: http://github.com/ENCODE-DCC/snovault/ (for the generic database) and http://github.com/ENCODE-DCC/encoded/ to store genomic data in the manner of ENCODE. The core database engine, SnoVault (which is completely independent of ENCODE, genomic data, or bioinformatic data) has been released as a separate Python package. PMID:28403240
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras.

PubMed

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-06-24

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.
A novel metadata management model to capture consent for record linkage in longitudinal research studies.

PubMed

McMahon, Christiana; Denaxas, Spiros

2017-11-06

Informed consent is an important feature of longitudinal research studies as it enables the linking of the baseline participant information with administrative data. The lack of standardized models to capture consent elements can lead to substantial challenges. A structured approach to capturing consent-related metadata can address these. a) Explore the state-of-the-art for recording consent; b) Identify key elements of consent required for record linkage; and c) Create and evaluate a novel metadata management model to capture consent-related metadata. The main methodological components of our work were: a) a systematic literature review and qualitative analysis of consent forms; b) the development and evaluation of a novel metadata model. We qualitatively analyzed 61 manuscripts and 30 consent forms. We extracted data elements related to obtaining consent for linkage. We created a novel metadata management model for consent and evaluated it by comparison with the existing standards and by iteratively applying it to case studies. The developed model can facilitate the standardized recording of consent for linkage in longitudinal research studies and enable the linkage of external participant data. Furthermore, it can provide a structured way of recording consent-related metadata and facilitate the harmonization and streamlining of processes.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras

PubMed Central

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-01-01

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system. PMID:27347961
A Collection Scheme for Tracing Information of Pig Safety Production

NASA Astrophysics Data System (ADS)

Luo, Qingyao; Xiong, Benhai; Yang, Liang

This study takes one main production pattern of smallhold pig farming in Tianjin as a study prototype, deeply analyzes characters of informations about tracing inputs including vaccines,feeds,veterinary drugs and supervision test in pig farming, proposesinputs metadata, criteria for integrating inputs event and interface norms for data transmision, developes and completes identification of 2D ear tags and traceability information collection system of pig safety production based on mobile PDA. The system has implemented functions including setting and invalidate of 2D ear tags, collection of tracing inputs and supervision in the mobile PDA and finally integration of tracing events (the epidemic event,feed event,drug event and supervision event) on the traceability data center (server). The PDA information collection system has been applied for demonstration in Tianjin, the collection is simple, convenient and feasible. It could meet with requirements of traceability information system of pig safety production
An Examination of the Adoption of Preservation Metadata in Cultural Heritage Institutions: An Exploratory Study Using Diffusion of Innovations Theory

ERIC Educational Resources Information Center

Alemneh, Daniel Gelaw

2009-01-01

Digital preservation is a significant challenge for cultural heritage institutions and other repositories of digital information resources. Recognizing the critical role of metadata in any successful digital preservation strategy, the Preservation Metadata Implementation Strategies (PREMIS) has been extremely influential on providing a "core" set…
Achieving interoperability for metadata registries using comparative object modeling.

PubMed

Park, Yu Rang; Kim, Ju Han

2010-01-01

Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.
A Model for Enhancing Internet Medical Document Retrieval with “Medical Core Metadata”

PubMed Central

Malet, Gary; Munoz, Felix; Appleyard, Richard; Hersh, William

1999-01-01

Objective: Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. Design: The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and Medline-type content descriptions. Results: The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. Conclusions: The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. PMID:10094069
NCPP's Use of Standard Metadata to Promote Open and Transparent Climate Modeling

NASA Astrophysics Data System (ADS)

Treshansky, A.; Barsugli, J. J.; Guentchev, G.; Rood, R. B.; DeLuca, C.

2012-12-01

The National Climate Predictions and Projections (NCPP) Platform is developing comprehensive regional and local information about the evolving climate to inform decision making and adaptation planning. This includes both creating and providing tools to create metadata about the models and processes used to create its derived data products. NCPP is using the Common Information Model (CIM), an ontology developed by a broad set of international partners in climate research, as its metadata language. This use of a standard ensures interoperability within the climate community as well as permitting access to the ecosystem of tools and services emerging alongside the CIM. The CIM itself is divided into a general-purpose (UML & XML) schema which structures metadata documents, and a project or community-specific (XML) Controlled Vocabulary (CV) which constraints the content of metadata documents. NCPP has already modified the CIM Schema to accommodate downscaling models, simulations, and experiments. NCPP is currently developing a CV for use by the downscaling community. Incorporating downscaling into the CIM will lead to several benefits: easy access to the existing CIM Documents describing CMIP5 models and simulations that are being downscaled, access to software tools that have been developed in order to search, manipulate, and visualize CIM metadata, and coordination with national and international efforts such as ES-DOC that are working to make climate model descriptions and datasets interoperable. Providing detailed metadata descriptions which include the full provenance of derived data products will contribute to making that data (and, the models and processes which generated that data) more open and transparent to the user community.
Metazen – metadata capture for metagenomes

PubMed Central

2014-01-01

Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508
Metazen - metadata capture for metagenomes.

PubMed

Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

2014-01-01

As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.
Tracking Actual Usage: The Attention Metadata Approach

ERIC Educational Resources Information Center

Wolpers, Martin; Najjar, Jehad; Verbert, Katrien; Duval, Erik

2007-01-01

The information overload in learning and teaching scenarios is a main hindering factor for efficient and effective learning. New methods are needed to help teachers and students in dealing with the vast amount of available information and learning material. Our approach aims to utilize contextualized attention metadata to capture behavioural…

Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format.

PubMed

Ismail, Mahmoud; Philbin, James

2015-04-01

The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies' metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata.
Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format

PubMed Central

Ismail, Mahmoud; Philbin, James

2015-01-01

Abstract. The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies’ metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata. PMID:26158117
Improving the accessibility and re-use of environmental models through provision of model metadata - a scoping study

NASA Astrophysics Data System (ADS)

Riddick, Andrew; Hughes, Andrew; Harpham, Quillon; Royse, Katherine; Singh, Anubha

2014-05-01

There has been an increasing interest both from academic and commercial organisations over recent years in developing hydrologic and other environmental models in response to some of the major challenges facing the environment, for example environmental change and its effects and ensuring water resource security. This has resulted in a significant investment in modelling by many organisations both in terms of financial resources and intellectual capital. To capitalise on the effort on producing models, then it is necessary for the models to be both discoverable and appropriately described. If this is not undertaken then the effort in producing the models will be wasted. However, whilst there are some recognised metadata standards relating to datasets these may not completely address the needs of modellers regarding input data for example. Also there appears to be a lack of metadata schemes configured to encourage the discovery and re-use of the models themselves. The lack of an established standard for model metadata is considered to be a factor inhibiting the more widespread use of environmental models particularly the use of linked model compositions which fuse together hydrologic models with models from other environmental disciplines. This poster presents the results of a Natural Environment Research Council (NERC) funded scoping study to understand the requirements of modellers and other end users for metadata about data and models. A user consultation exercise using an on-line questionnaire has been undertaken to capture the views of a wide spectrum of stakeholders on how they are currently managing metadata for modelling. This has provided a strong confirmation of our original supposition that there is a lack of systems and facilities to capture metadata about models. A number of specific gaps in current provision for data and model metadata were also identified, including a need for a standard means to record detailed information about the modelling environment and the model code used, to assist the selection of models for linked compositions. Existing best practice, including the use of current metadata standards (e.g. ISO 19110, ISO 19115 and ISO 19119) and the metadata components of WaterML were also evaluated. In addition to commonly used metadata attributes (e.g. spatial reference information) there was significant interest in recording a variety of additional metadata attributes. These included more detailed information about temporal data, and also providing estimates of data accuracy and uncertainty within metadata. This poster describes the key results of this study, including a number of gaps in the provision of metadata for modelling, and outlines how these might be addressed. Overall the scoping study has highlighted significant interest in addressing this issue within the environmental modelling community. There is therefore an impetus for on-going research, and we are seeking to take this forward through collaboration with other interested organisations. Progress towards an internationally recognised model metadata standard is suggested.
Pragmatic Metadata Management for Integration into Multiple Spatial Data Infrastructure Systems and Platforms

NASA Astrophysics Data System (ADS)

Benedict, K. K.; Scott, S.

2013-12-01

While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into multiple data infrastructures, as has been demonstrated through EDAC's testing and deployment of metadata into multiple external systems: Data.Gov, the GEOSS Registry, the DataONE network, the DSpace based institutional repository at UNM and semantic mediation systems developed as part of the NASA ACCESS ELSeWEB project. Each of these systems requires valid metadata as a first step, but to make most effective use of the delivered metadata each also has a set of conventions that are specific to the system. This presentation will provide an overview of the underlying metadata management model, the processes and web services that have been developed to automatically generate metadata in a variety of standard formats and highlight some of the specific modifications made to the output metadata content to support the different conventions used by the multiple metadata integration endpoints.
ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability

NASA Astrophysics Data System (ADS)

Trapasso, T. J.

2017-12-01

With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).
Metadata and annotations for multi-scale electrophysiological data.

PubMed

Bower, Mark R; Stead, Matt; Brinkmann, Benjamin H; Dufendach, Kevin; Worrell, Gregory A

2009-01-01

The increasing use of high-frequency (kHz), long-duration (days) intracranial monitoring from multiple electrodes during pre-surgical evaluation for epilepsy produces large amounts of data that are challenging to store and maintain. Descriptive metadata and clinical annotations of these large data sets also pose challenges to simple, often manual, methods of data analysis. The problems of reliable communication of metadata and annotations between programs, the maintenance of the meanings within that information over long time periods, and the flexibility to re-sort data for analysis place differing demands on data structures and algorithms. Solutions to these individual problem domains (communication, storage and analysis) can be configured to provide easy translation and clarity across the domains. The Multi-scale Annotation Format (MAF) provides an integrated metadata and annotation environment that maximizes code reuse, minimizes error probability and encourages future changes by reducing the tendency to over-fit information technology solutions to current problems. An example of a graphical utility for generating and evaluating metadata and annotations for "big data" files is presented.
GeneLab Analysis Working Group Kick-Off Meeting

NASA Technical Reports Server (NTRS)

Costes, Sylvain V.

2018-01-01

Goals to achieve for GeneLab AWG - GL vision - Review of GeneLab AWG charter Timeline and milestones for 2018 Logistics - Monthly Meeting - Workshop - Internship - ASGSR Introduction of team leads and goals of each group Introduction of all members Q/A Three-tier Client Strategy to Democratize Data Physiological changes, pathway enrichment, differential expression, normalization, processing metadata, reproducibility, Data federation/integration with heterogeneous bioinformatics external databases The GLDS currently serves over 100 omics investigations to the biomedical community via open access. In order to expand the scope of metadata record searches via the GLDS, we designed a metadata warehouse that collects and updates metadata records from external systems housing similar data. To demonstrate the capabilities of federated search and retrieval of these data, we imported metadata records from three open-access data systems into the GLDS metadata warehouse: NCBI's Gene Expression Omnibus (GEO), EBI's PRoteomics IDEntifications (PRIDE) repository, and the Metagenomics Analysis server (MG-RAST). Each of these systems defines metadata for omics data sets differently. One solution to bridge such differences is to employ a common object model (COM) to which each systems' representation of metadata can be mapped. Warehoused metadata records are then transformed at ETL to this single, common representation. Queries generated via the GLDS are then executed against the warehouse, and matching records are shown in the COM representation (Fig. 1). While this approach is relatively straightforward to implement, the volume of the data in the omics domain presents challenges in dealing with latency and currency of records. Furthermore, the lack of a coordinated has been federated data search for and retrieval of these kinds of data across other open-access systems, so that users are able to conduct biological meta-investigations using data from a variety of sources. Such meta-investigations are key to corroborating findings from many kinds of assays and translating them into systems biology knowledge and, eventually, therapeutics.
The XML Metadata Editor of GFZ Data Services

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

2017-04-01

Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor to be reused to generate metadata for data discovery and contextual metadata developed by the 'Multi-scale Laboratories' Thematic Core Service of the European Plate Observing System (EPOS-IP). The Editor will be used to build a common repository of a large variety of geological and geophysical datasets produced by multidisciplinary laboratories throughout Europe, thus contributing to a significant step toward the integration and accessibility of earth science data. This presentation will introduce the metadata editor and show the adjustments made for EPOS-IP. [1] http://dataservices.gfz-potsdam.de/panmetaworks/metaedit
GRIIDC: A Data Repository for Gulf of Mexico Science

NASA Astrophysics Data System (ADS)

Ellis, S.; Gibeaut, J. C.

2017-12-01

The Gulf of Mexico Research Initiative Information & Data Cooperative (GRIIDC) system is a data management solution appropriate for any researcher sharing Gulf of Mexico and oil spill science data. Our mission is to ensure a data and information legacy that promotes continual scientific discovery and public awareness of the Gulf of Mexico ecosystem. GRIIDC developed an open-source software solution to manage data from the Gulf of Mexico Research Initiative (GoMRI). The GoMRI program has over 2500 researchers from diverse fields of study with a variety of attitudes, experiences, and capacities for data sharing. The success of this solution is apparent through new partnerships to share data generated by RESTORE Act Centers of Excellence Programs, the National Academies of Science, and others. The GRIIDC data management system integrates dataset management planning, metadata creation, persistent identification, and data discoverability into an easy-to-use web application. No specialized software or program installations are required to support dataset submission or discovery. Furthermore, no data transformations are needed to submit data to GRIIDC; common file formats such as Excel, csv, and text are all acceptable for submissions. To ensure data are properly documented using the GRIIDC implementation of the ISO 19115-2 metadata standard, researchers submit detailed descriptive information through a series of interactive forms and no knowledge of metadata or xml formats are required. Once a dataset is documented and submitted the GRIIDC team performs a review of the dataset package. This review ensures that files can be opened and contain data, and that data are completely and accurately described. This review does not include performing quality assurance or control of data points, as GRIIDC expects scientists to perform these steps during the course of their work. Once approved, data are made public and searchable through the GRIIDC data discovery portal and the DataONE network.
MaNIDA: Integration of marine expedition information, data and publications: Data Portal of German Marine Research

NASA Astrophysics Data System (ADS)

Koppe, Roland; Scientific MaNIDA-Team

2013-04-01

The Marine Network for Integrated Data Access (MaNIDA) aims to build a sustainable e-infrastructure to support discovery and re-use of marine data from distinct data providers in Germany (see related abstracts in session ESSI 1.2). In order to provide users integrated access and retrieval of expedition or cruise metadata, data, services and publications as well as relationships among the various objects, we are developing (web) applications based on state of the art technologies: the Data Portal of German Marine Research. Since the German network of distributed content providers have distinct objectives and mandates for storing digital objects (e.g. long-term data preservation, near real time data, publication repositories), we have to cope with heterogeneous metadata in terms of syntax and semantic, data types and formats as well as access solutions. We have defined a set of core metadata elements which are common to our content providers and therefore useful for discovery and building relationships among objects. Existing catalogues for various types of vocabularies are being used to assure the mapping to community-wide used terms. We distinguish between expedition metadata and continuously harvestable metadata objects from distinct data providers. • Existing expedition metadata from distinct sources is integrated and validated in order to create an expedition metadata catalogue which is used as authoritative source for expedition-related content. The web application allows browsing by e.g. research vessel and date, exploring expeditions and research gaps by tracklines and viewing expedition details (begin/end, ports, platforms, chief scientists, events, etc.). Also expedition-related objects from harvesting are dynamically associated with expedition information and presented to the user. Hence we will provide web services to detailed expedition information. • Other harvestable content is separated into four categories: archived data and data products, near real time data, publications and reports. Reports are a special case of publication, describing cruise planning, cruise reports or popular reports on expeditions and are orthogonal to e.g. peer-reviewed articles. Each object's metadata contains at least: identifier(s) e.g. doi/hdl, title, author(s), date, expedition(s), platform(s) e.g. research vessel Polarstern. Furthermore project(s), parameter(s), device(s) and e.g. geographic coverage are of interest. An international gazetteer resolves geographic coverage to region names and annotates to object metadata. Information is homogenously presented to the user, independent of the underlying format, but adaptable to specific disciplines e.g. bathymetry. Also data access and dissemination information is available to the user as data download link or web services (e.g. WFS, WMS). Based on relationship metadata we are dynamically building graphs of objects to support the user in finding possible relevant associated objects. Technically metadata is based on ISO / OGC standards or provider specification. Metadata is harvested via OAI-PMH or OGC CSW and indexed with Apache Lucene. This enables powerful full-text search, geographic and temporal search as well as faceting. In this presentation we will illustrate the architecture and the current implementation of our integrated approach.
The RBV metadata catalog

NASA Astrophysics Data System (ADS)

Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

2015-04-01

RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.
76 FR 71998 - Agency Information Collection Activities: Submitted for Office of Management and Budget (OMB...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-11-21

... development of standardized metadata in hundreds of organizations, and funded numerous implementations of OGC... of emphasis include: Metadata documentation, clearinghouse establishment, framework development...
Syndicating Rich Bibliographic Metadata Using MODS and RSS

ERIC Educational Resources Information Center

Ashton, Andrew

2008-01-01

Many libraries use RSS to syndicate information about their collections to users. A survey of 65 academic libraries revealed their most common use for RSS is to disseminate information about library holdings, such as lists of new acquisitions. Even though typical RSS feeds are ill suited to the task of carrying rich bibliographic metadata, great…
Policy enabled information sharing system

DOEpatents

Jorgensen, Craig R.; Nelson, Brian D.; Ratheal, Steve W.

2014-09-02

A technique for dynamically sharing information includes executing a sharing policy indicating when to share a data object responsive to the occurrence of an event. The data object is created by formatting a data file to be shared with a receiving entity. The data object includes a file data portion and a sharing metadata portion. The data object is encrypted and then automatically transmitted to the receiving entity upon occurrence of the event. The sharing metadata portion includes metadata characterizing the data file and referenced in connection with the sharing policy to determine when to automatically transmit the data object to the receiving entity.
Towards a semantics-based approach in the development of geographic portals

NASA Astrophysics Data System (ADS)

Athanasis, Nikolaos; Kalabokidis, Kostas; Vaitis, Michail; Soulakellis, Nikolaos

2009-02-01

As the demand for geospatial data increases, the lack of efficient ways to find suitable information becomes critical. In this paper, a new methodology for knowledge discovery in geographic portals is presented. Based on the Semantic Web, our approach exploits the Resource Description Framework (RDF) in order to describe the geoportal's information with ontology-based metadata. When users traverse from page to page in the portal, they take advantage of the metadata infrastructure to navigate easily through data of interest. New metadata descriptions are published in the geoportal according to the RDF schemas.
Metadata, Identifiers, and Physical Samples

NASA Astrophysics Data System (ADS)

Arctur, D. K.; Lenhardt, W. C.; Hills, D. J.; Jenkyns, R.; Stroker, K. J.; Todd, N. S.; Dassie, E. P.; Bowring, J. F.

2016-12-01

Physical samples are integral to much of the research conducted by geoscientists. The samples used in this research are often obtained at significant cost and represent an important investment for future research. However, making information about samples - whether considered data or metadata - available for researchers to enable discovery is difficult: a number of key elements related to samples are difficult to characterize in common ways, such as classification, location, sample type, sampling method, repository information, subsample distribution, and instrumentation, because these differ from one domain to the next. Unifying these elements or developing metadata crosswalks is needed. The iSamples (Internet of Samples) NSF-funded Research Coordination Network (RCN) is investigating ways to develop these types of interoperability and crosswalks. Within the iSamples RCN, one of its working groups, WG1, has focused on the metadata related to physical samples. This includes identifying existing metadata standards and systems, and how they might interoperate with the International Geo Sample Number (IGSN) schema (schema.igsn.org) in order to help inform leading practices for metadata. For example, we are examining lifecycle metadata beyond the IGSN `birth certificate.' As a first step, this working group is developing a list of relevant standards and comparing their various attributes. In addition, the working group is looking toward technical solutions to facilitate developing a linked set of registries to build the web of samples. Finally, the group is also developing a comparison of sample identifiers and locators. This paper will provide an overview and comparison of the standards identified thus far, as well as an update on the technical solutions examined for integration. We will discuss how various sample identifiers might work in complementary fashion with the IGSN to more completely describe samples, facilitate retrieval of contextual information, and access research work on related samples. Finally, we welcome suggestions and community input to move physical sample unique identifiers forward.
An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper).

PubMed

Valdez, Joshua; Rueschman, Michael; Kim, Matthew; Redline, Susan; Sahoo, Satya S

2016-10-01

Extraction of structured information from biomedical literature is a complex and challenging problem due to the complexity of biomedical domain and lack of appropriate natural language processing (NLP) techniques. High quality domain ontologies model both data and metadata information at a fine level of granularity, which can be effectively used to accurately extract structured information from biomedical text. Extraction of provenance metadata, which describes the history or source of information, from published articles is an important task to support scientific reproducibility. Reproducibility of results reported by previous research studies is a foundational component of scientific advancement. This is highlighted by the recent initiative by the US National Institutes of Health called "Principles of Rigor and Reproducibility". In this paper, we describe an effective approach to extract provenance metadata from published biomedical research literature using an ontology-enabled NLP platform as part of the Provenance for Clinical and Healthcare Research (ProvCaRe). The ProvCaRe-NLP tool extends the clinical Text Analysis and Knowledge Extraction System (cTAKES) platform using both provenance and biomedical domain ontologies. We demonstrate the effectiveness of ProvCaRe-NLP tool using a corpus of 20 peer-reviewed publications. The results of our evaluation demonstrate that the ProvCaRe-NLP tool has significantly higher recall in extracting provenance metadata as compared to existing NLP pipelines such as MetaMap.
Metazen – metadata capture for metagenomes

DOE PAGES

Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...

2014-12-08

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Metazen – metadata capture for metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, Jared; Harrison, Travis; Paczian, Tobias

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Creating preservation metadata from XML-metadata profiles

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

2014-05-01

Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the Metadata Encoding and Transmission Standard (METS). To find datasets in future portals and to make use of this data in own scientific work, proper selection of discovery metadata and application metadata is very important. Some XML-metadata profiles are not suitable for preservation, because version changes are very fast and make it nearly impossible to automate the migration. For other XML-metadata profiles schema definitions are changed after publication of the profile or the schema definitions become inaccessible, which might cause problems during validation of the metadata inside the preservation system [2]. Some metadata profiles are not used widely enough and might not even exist in the future. Eventually, discovery and application metadata have to be embedded into the mdWrap-subtree of the METS-XML. [1] http://www.archivematica.org [2] http://dx.doi.org/10.2218/ijdc.v7i1.215

Creating Access Points to Instrument-Based Atmospheric Data: Perspectives from the ARM Metadata Manager

NASA Astrophysics Data System (ADS)

Troyan, D.

2016-12-01

The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
Defining Linkages between the GSC and NSF's LTER Program: How the Ecological Metadata Language (EML) Relates to GCDML and Other Outcomes

Treesearch

Inigo San Gil; Wade Sheldon; Tom Schmidt; Mark Servilla; Raul Aguilar; Corinna Gries; Tanya Gray; Dawn Field; James Cole; Jerry Yun Pan; Giri Palanisamy; Donald Henshaw; Margaret O' Brien; Linda Kinkel; Kathrine McMahon; Renzo Kottmann; Linda Amaral-Zettler; John Hobbie; Philip Goldstein; Robert P. Guralnick; James Brunt; William K. Michener

2008-01-01

The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML)....
Building a high level sample processing and quality assessment model for biogeochemical measurements: a case study from the ocean acidification community

NASA Astrophysics Data System (ADS)

Thomas, R.; Connell, D.; Spears, T.; Leadbetter, A.; Burger, E. F.

2016-12-01

The scientific literature heavily features small-scale studies with the impact of the results extrapolated to regional/global importance. There are on-going initiatives (e.g. OA-ICC, GOA-ON, GEOTRACES, EMODNet Chemistry) aiming to assemble regional to global-scale datasets that are available for trend or meta-analyses. Assessing the quality and comparability of these data requires information about the processing chain from "sampling to spreadsheet". This provenance information needs to be captured and readily available to assess data fitness for purpose. The NOAA Ocean Acidification metadata template was designed in consultation with domain experts for this reason; the core carbonate chemistry variables have 23-37 metadata fields each and for scientists generating these datasets there could appear to be an ever increasing amount of metadata expected to accompany a dataset. While this provenance metadata should be considered essential by those generating or using the data, for those discovering data there is a sliding scale between what is considered discovery metadata (title, abstract, contacts, etc.) versus usage metadata (methodology, environmental setup, lineage, etc.), the split depending on the intended use of data. As part of the OA-ICC's activities, the metadata fields from the NOAA template relevant to the sample processing chain and QA criteria have been factored to develop profiles for, and extensions to, the OM-JSON encoding supported by the PROV ontology. While this work started focused on carbonate chemistry variable specific metadata, the factorization could be applied within the O&M model across other disciplines such as trace metals or contaminants. In a linked data world with a suitable high level model for sample processing and QA available, tools and support can be provided to link reproducible units of metadata (e.g. the standard protocol for a variable as adopted by a community) and simplify the provision of metadata and subsequent discovery.
Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

PubMed

Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

2017-06-05

Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.
Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.

2016-12-01

Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality and completeness. The inventory is accessible at http://cinergi.sdsc.edu, and the CINERGI project web page is http://earthcube.org/group/cinergi
User's Guide and Metadata to Coastal Biodiversity Risk Analysis Tool (CBRAT): Framework for the Systemization of Life History and Biogeographic Information

EPA Science Inventory

ABSTRACTUser’s Guide & Metadata to Coastal Biodiversity Risk Analysis Tool (CBRAT): Framework for the Systemization of Life History and Biogeographic Information(EPA/601/B-15/001, 2015, 123 pages)Henry Lee II, U.S. EPA, Western Ecology DivisionKatharine Marko, U.S. EPA,...
EOS ODL Metadata On-line Viewer

NASA Astrophysics Data System (ADS)

Yang, J.; Rabi, M.; Bane, B.; Ullman, R.

2002-12-01

We have recently developed and deployed an EOS ODL metadata on-line viewer. The EOS ODL metadata viewer is a web server that takes: 1) an EOS metadata file in Object Description Language (ODL), 2) parameters, such as which metadata to view and what style of display to use, and returns an HTML or XML document displaying the requested metadata in the requested style. This tool is developed to address widespread complaints by science community that the EOS Data and Information System (EOSDIS) metadata files in ODL are difficult to read by allowing users to upload and view an ODL metadata file in different styles using a web browser. Users have the selection to view all the metadata or part of the metadata, such as Collection metadata, Granule metadata, or Unsupported Metadata. Choices of display styles include 1) Web: a mouseable display with tabs and turn-down menus, 2) Outline: Formatted and colored text, suitable for printing, 3) Generic: Simple indented text, a direct representation of the underlying ODL metadata, and 4) None: No stylesheet is applied and the XML generated by the converter is returned directly. Not all display styles are implemented for all the metadata choices. For example, Web style is only implemented for Collection and Granule metadata groups with known attribute fields, but not for Unsupported, Other, and All metadata. The overall strategy of the ODL viewer is to transform an ODL metadata file to a viewable HTML in two steps. The first step is to convert the ODL metadata file to an XML using a Java-based parser/translator called ODL2XML. The second step is to transform the XML to an HTML using stylesheets. Both operations are done on the server side. This allows a lot of flexibility in the final result, and is very portable cross-platform. Perl CGI behind the Apache web server is used to run the Java ODL2XML, and then run the results through an XSLT processor. The EOS ODL viewer can be accessed from either a PC or a Mac using Internet Explorer 5.0+ or Netscape 4.7+.
ISO 19115 Experiences in NASA's Earth Observing System (EOS) ClearingHOuse (ECHO)

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Mitchell, A.

2011-12-01

Metadata is an important entity in the process of cataloging, discovering, and describing earth science data. As science research and the gathered data increases in complexity, so does the complexity and importance of descriptive metadata. To meet these growing needs, the metadata models required utilize richer and more mature metadata attributes. Categorizing, standardizing, and promulgating these metadata models to a politically, geographically, and scientifically diverse community is a difficult process. An integral component of metadata management within NASA's Earth Observing System Data and Information System (EOSDIS) is the Earth Observing System (EOS) ClearingHOuse (ECHO). ECHO is the core metadata repository for the EOSDIS data centers providing a centralized mechanism for metadata and data discovery and retrieval. ECHO has undertaken an internal restructuring to meet the changing needs of scientists, the consistent advancement in technology, and the advent of new standards such as ISO 19115. These improvements were based on the following tenets for data discovery and retrieval: + There exists a set of 'core' metadata fields recommended for data discovery. + There exists a set of users who will require the entire metadata record for advanced analysis. + There exists a set of users who will require a 'core' set metadata fields for discovery only. + There will never be a cessation of new formats or a total retirement of all old formats. + Users should be presented metadata in a consistent format of their choosing. In order to address the previously listed items, ECHO's new metadata processing paradigm utilizes the following approach: + Identify a cross-format set of 'core' metadata fields necessary for discovery. + Implement format-specific indexers to extract the 'core' metadata fields into an optimized query capability. + Archive the original metadata in its entirety for presentation to users requiring the full record. + Provide on-demand translation of 'core' metadata to any supported result format. Lessons learned by the ECHO team while implementing its new metadata approach to support usage of the ISO 19115 standard will be presented. These lessons learned highlight some discovered strengths and weaknesses in the ISO 19115 standard as it is introduced to an existing metadata processing system.
36 CFR 1220.18 - What definitions apply to the regulations in Subchapter B?

Code of Federal Regulations, 2014 CFR

2014-07-01

... under the Federal Records Act. The term includes both record content and associated metadata that the... dissemination of information in accordance with defined procedures, whether automated or manual. Metadata...
36 CFR 1220.18 - What definitions apply to the regulations in Subchapter B?

Code of Federal Regulations, 2012 CFR

2012-07-01

... under the Federal Records Act. The term includes both record content and associated metadata that the... dissemination of information in accordance with defined procedures, whether automated or manual. Metadata...
36 CFR § 1220.18 - What definitions apply to the regulations in Subchapter B?

Code of Federal Regulations, 2013 CFR

2013-07-01

... content and associated metadata that the agency determines is required to meet agency business needs..., whether automated or manual. Metadata consists of preserved contextual information describing the history...
DichroMatch at the protein circular dichroism data bank (DM@PCDDB): A web-based tool for identifying protein nearest neighbors using circular dichroism spectroscopy.

PubMed

Whitmore, Lee; Mavridis, Lazaros; Wallace, B A; Janes, Robert W

2018-01-01

Circular dichroism spectroscopy is a well-used, but simple method in structural biology for providing information on the secondary structure and folds of proteins. DichroMatch (DM@PCDDB) is an online tool that is newly available in the Protein Circular Dichroism Data Bank (PCDDB), which takes advantage of the wealth of spectral and metadata deposited therein, to enable identification of spectral nearest neighbors of a query protein based on four different methods of spectral matching. DM@PCDDB can potentially provide novel information about structural relationships between proteins and can be used in comparison studies of protein homologs and orthologs. © 2017 The Authors Protein Science published by Wiley Periodicals, Inc. on behalf of The Protein Society.
Assisted editing od SensorML with EDI. A bottom-up scenario towards the definition of sensor profiles.

NASA Astrophysics Data System (ADS)

Oggioni, Alessandro; Tagliolato, Paolo; Fugazza, Cristiano; Bastianini, Mauro; Pavesi, Fabio; Pepe, Monica; Menegon, Stefano; Basoni, Anna; Carrara, Paola

2015-04-01

Sensor observation systems for environmental data have become increasingly important in the last years. The EGU's Informatics in Oceanography and Ocean Science track stressed the importance of management tools and solutions for marine infrastructures. We think that full interoperability among sensor systems is still an open issue and that the solution to this involves providing appropriate metadata. Several open source applications implement the SWE specification and, particularly, the Sensor Observation Services (SOS) standard. These applications allow for the exchange of data and metadata in XML format between computer systems. However, there is a lack of metadata editing tools supporting end users in this activity. Generally speaking, it is hard for users to provide sensor metadata in the SensorML format without dedicated tools. In particular, such a tool should ease metadata editing by providing, for standard sensors, all the invariant information to be included in sensor metadata, thus allowing the user to concentrate on the metadata items that are related to the specific deployment. RITMARE, the Italian flagship project on marine research, envisages a subproject, SP7, for the set-up of the project's spatial data infrastructure. SP7 developed EDI, a general purpose, template-driven metadata editor that is composed of a backend web service and an HTML5/javascript client. EDI can be customized for managing the creation of generic metadata encoded as XML. Once tailored to a specific metadata format, EDI presents the users a web form with advanced auto completion and validation capabilities. In the case of sensor metadata (SensorML versions 1.0.1 and 2.0), the EDI client is instructed to send an "insert sensor" request to an SOS endpoint in order to save the metadata in an SOS server. In the first phase of project RITMARE, EDI has been used to simplify the creation from scratch of SensorML metadata by the involved researchers and data managers. An interesting by-product of this ongoing work is currently constituting an archive of predefined sensor descriptions. This information is being collected in order to further ease metadata creation in the next phase of the project. Users will be able to choose among a number of sensor and sensor platform prototypes: These will be specific instances on which it will be possible to define, in a bottom-up approach, "sensor profiles". We report on the outcome of this activity.
Evaluating and Evolving Metadata in Multiple Dialects

NASA Technical Reports Server (NTRS)

Kozimore, John; Habermann, Ted; Gordon, Sean; Powers, Lindsay

2016-01-01

Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways.
Metadata to Support Data Warehouse Evolution

NASA Astrophysics Data System (ADS)

Solodovnikova, Darja

The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.

PubMed

Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten

2015-01-01

The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal.

PubMed

Baker, Ed

2013-01-01

Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

PubMed Central

2013-01-01

Abstract Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping. PMID:24723768
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence' (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/
The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Kyrpides, Nikos C.

2008-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource that provides information on genome and metagenome projects worldwide. Complete and ongoing projects and their associated metadata can be accessed in GOLD through pre-computed lists and a search page. As of September 2007, GOLD contains information on more than 2900 sequencing projects, out of which 639 have been completed and their sequence data deposited in the public databases. GOLD continues to expand with the goal of providing metadata information related to the projects and the organisms/environments towards the Minimum Information about a Genome Sequence’ (MIGS) guideline. GOLD is available at http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/ PMID:17981842

Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

NASA Astrophysics Data System (ADS)

Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

2011-12-01

Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2005-01-01

This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2004-12-01

This paper considers the deceptively simple question: Why can"t digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
77 FR 12871 - Agency Information Collection Activities: Comment Request for National Geological and Geophysical...

Federal Register 2010, 2011, 2012, 2013, 2014

2012-03-02

...; (2) Build the National Catalog by providing site-specific metadata for items in inventoried... geophysical data; (b) Digital infrastructure; (c) Metadata for items in data collections; and (d) Special data...
Structure and inference in annotated networks

PubMed Central

Newman, M. E. J.; Clauset, Aaron

2016-01-01

For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this ‘metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains. PMID:27306566
Structure and inference in annotated networks

NASA Astrophysics Data System (ADS)

Newman, M. E. J.; Clauset, Aaron

2016-06-01

For many networks of scientific interest we know both the connections of the network and information about the network nodes, such as the age or gender of individuals in a social network. Here we demonstrate how this `metadata' can be used to improve our understanding of network structure. We focus in particular on the problem of community detection in networks and develop a mathematically principled approach that combines a network and its metadata to detect communities more accurately than can be done with either alone. Crucially, the method does not assume that the metadata are correlated with the communities we are trying to find. Instead, the method learns whether a correlation exists and correctly uses or ignores the metadata depending on whether they contain useful information. We demonstrate our method on synthetic networks with known structure and on real-world networks, large and small, drawn from social, biological and technological domains.
A Metadata Action Language

NASA Technical Reports Server (NTRS)

Golden, Keith; Clancy, Dan (Technical Monitor)

2001-01-01

The data management problem comprises data processing and data tracking. Data processing is the creation of new data based on existing data sources. Data tracking consists of storing metadata descriptions of available data. This paper addresses the data management problem by casting it as an AI planning problem. Actions are data-processing commands, plans are dataflow programs and goals are metadata descriptions of desired data products. Data manipulation is simply plan generation and execution, and a key component of data tracking is inferring the effects of an observed plan. We introduce a new action language for data management domains, called ADILM. We discuss the connection between data processing and information integration and show how a language for the latter must be modified to support the former. The paper also discusses information gathering within a data-processing framework, and show how ADILM metadata expressions are a generalization of Local Completeness.
Mitogenome metadata: current trends and proposed standards.

PubMed

Strohm, Jeff H T; Gwiazdowski, Rodger A; Hanner, Robert

2016-09-01

Mitogenome metadata are descriptive terms about the sequence, and its specimen description that allow both to be digitally discoverable and interoperable. Here, we review a sampling of mitogenome metadata published in the journal Mitochondrial DNA between 2005 and 2014. Specifically, we have focused on a subset of metadata fields that are available for GenBank records, and specified by the Genomics Standards Consortium (GSC) and other biodiversity metadata standards; and we assessed their presence across three main categories: collection, biological and taxonomic information. To do this we reviewed 146 mitogenome manuscripts, and their associated GenBank records, and scored them for 13 metadata fields. We also explored the potential for mitogenome misidentification using their sequence diversity, and taxonomic metadata on the Barcode of Life Datasystems (BOLD). For this, we focused on all Lepidoptera and Perciformes mitogenomes included in the review, along with additional mitogenome sequence data mined from Genbank. Overall, we found that none of 146 mitogenome projects provided all the metadata we looked for; and only 17 projects provided at least one category of metadata across the three main categories. Comparisons using mtDNA sequences from BOLD, suggest that some mitogenomes may be misidentified. Lastly, we appreciate the research potential of mitogenomes announced through this journal; and we conclude with a suggestion of 13 metadata fields, available on GenBank, that if provided in a mitogenomes's GenBank record, would increase their research value.
Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh

2013-12-01

SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of methods including OAI-PMH [3]. The Mercury search interface then allows users to perform simple, fielded, spatial and temporal searches across a central harmonized index of metadata. Mercury supports various metadata standards including FGDC, ISO-19115, DIF, Dublin-Core, Darwin-Core, and EML. This poster describes in detail how Mercury implements the Unified Science Information Model for Soil moisture data. References: [1]Devarakonda R., et al. Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics (2010), 3(1): 87-94. [2]Devarakonda R., et al. Daymet: Single Pixel Data Extraction Tool. http://daymet.ornl.gov/singlepixel.html (2012). Last Accesses 10-01-2013 [3]Devarakonda R., et al. Data sharing and retrieval using OAI-PMH. Earth Science Informatics (2011), 4(1): 1-5.
A document centric metadata registration tool constructing earth environmental data infrastructure

NASA Astrophysics Data System (ADS)

Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.

2009-12-01

DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool provides the section titles of the documents and by filling out the content of each section, the documents for the data sets are automatically published in PDF and HTML format. Furthermore, the metadata XML file which is compliant with ISO19115 and ISO19139 is created at the same moment. The generated metadata are managed in the metadata database of the DIAS project, and will be used in various ISO19139 compliant metadata management tools, such as GeoNetwork.
Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

NASA Astrophysics Data System (ADS)

Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.

2016-12-01

The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.
Chapter 35: Describing Data and Data Collections in the VO

NASA Astrophysics Data System (ADS)

Kent, B. R.; Hanisch, R. J.; Williams, R. D.

The list of numbers: 19.22, 17.23, 18.11, 16.98, and 15.11, is of little intrinsic interest without information about the context in which they appear. For instance, are these daily closing stock prices for your favorite investment, or are they hourly photometric measurements of an increasingly bright quasar? The information needed to define this context is called metadata. Metadata are data about data. Astronomers are familiar with metadata through the headers of FITS files and the names and units associated with columns in a table or database. In the VO, metadata describe the contents of tables, images, and spectra, as well as aggregate collections of data (archives, surveys) and computational services. Moreover, VO metadata are constructed according to rules that avoid ambiguity and make it clear whether, in the example above, the stock prices are in dollars or euros, or the photometry is Johnson V or Sloan g. Organization of data is important in any scientific discipline. Equally crucial are the descriptions of that data: the organization publishing the data, its creator or the person making it available, what instruments were used, units assigned to measurement, calibration status, and data quality assessment. The Virtual Observatory metadata scheme not only applies to datasets, but to resources as well, including data archive facilities, searchable web forms, and online analysis and display tools. Since the scientific output flowing from large datasets depends greatly on how well the data are described, it is important for users to understand the basics of the metadata scheme in order to locate the data that they want and use it correctly. Metadata are the key to data discovery and data and service interoperability in the Virtual Observatory.
Towards a semantic medical Web: HealthCyberMap's tool for building an RDF metadata base of health information resources based on the Qualified Dublin Core Metadata Set.

PubMed

Boulos, Maged N; Roudsari, Abdul V; Carson, Ewart R

2002-07-01

HealthCyberMap (http://healthcybermap.semanticweb.org/) aims at mapping Internet health information resources in novel ways for enhanced retrieval and navigation. This is achieved by collecting appropriate resource metadata in an unambiguous form that preserves semantics. We modelled a qualified Dublin Core (DC) metadata set ontology with extra elements for resource quality and geographical provenance in Prot g -2000. A metadata collection form helps acquiring resource instance data within Prot g . The DC subject field is populated with UMLS terms directly imported from UMLS Knowledge Source Server using UMLS tab, a Prot g -2000 plug-in. The project is saved in RDFS/RDF. The ontology and associated form serve as a free tool for building and maintaining an RDF medical resource metadata base. The UMLS tab enables browsing and searching for concepts that best describe a resource, and importing them to DC subject fields. The resultant metadata base can be used with a search and inference engine, and have textual and/or visual navigation interface(s) applied to it, to ultimately build a medical Semantic Web portal. Different ways of exploiting Prot g -2000 RDF output are discussed. By making the context and semantics of resources, not merely their raw text and formatting, amenable to computer 'understanding,' we can build a Semantic Web that is more useful to humans than the current Web. This requires proper use of metadata and ontologies. Clinical codes can reliably describe the subjects of medical resources, establish the semantic relationships (as defined by underlying coding scheme) between related resources, and automate their topical categorisation.
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.

PubMed

Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin

2018-02-01

More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
A rapid prototyping/artificial intelligence approach to space station-era information management and access

NASA Technical Reports Server (NTRS)

Carnahan, Richard S., Jr.; Corey, Stephen M.; Snow, John B.

1989-01-01

Applications of rapid prototyping and Artificial Intelligence techniques to problems associated with Space Station-era information management systems are described. In particular, the work is centered on issues related to: (1) intelligent man-machine interfaces applied to scientific data user support, and (2) the requirement that intelligent information management systems (IIMS) be able to efficiently process metadata updates concerning types of data handled. The advanced IIMS represents functional capabilities driven almost entirely by the needs of potential users. Space Station-era scientific data projected to be generated is likely to be significantly greater than data currently processed and analyzed. Information about scientific data must be presented clearly, concisely, and with support features to allow users at all levels of expertise efficient and cost-effective data access. Additionally, mechanisms for allowing more efficient IIMS metadata update processes must be addressed. The work reported covers the following IIMS design aspects: IIMS data and metadata modeling, including the automatic updating of IIMS-contained metadata, IIMS user-system interface considerations, including significant problems associated with remote access, user profiles, and on-line tutorial capabilities, and development of an IIMS query and browse facility, including the capability to deal with spatial information. A working prototype has been developed and is being enhanced.
The Road to Independently Understandable Information

NASA Astrophysics Data System (ADS)

Habermann, T.; Robinson, E.

2017-12-01

The turn of the 21st century was a pivotal time in the Earth and Space Science information ecosystem. The Content Standard for Digital Geospatial Metadata (CSDGM) had existed for nearly a decade and ambitious new standards were just emerging. The U.S. Federal Geospatial Data Committee (FGDC) had extended many of the concepts from CSDGM into the International community with ISO 19115:2003 and the Consultative Committee for Space Data Systems (CCSDS) had migrated their Open Archival Information System (OAIS) Reference Model into an international standard (ISO 14721:2003). The OAIS model outlined the roles and responsibilities of archives with the principle role being preserving information and making it available to users, a "designated community", as a service to the data producer. It was mandatory for the archive to ensure that information is "independently understandable" to the designated community and to maintain that understanding through on-going partnerships between archives and designated communities. Standards can play a role in supporting these partnerships as designated communities expand across disciplinary and geographic boundaries. The ISO metadata standards include many capabilities that might make critical contributions to this goal. These include connections to resources outside of the metadata record (i.e. documentation) and mechanisms for ongoing incorporation of user feedback into the metadata stream. We will demonstrate these capabilities with examples of how they can increase understanding.
Digital data in support of studies and assessments of coal and petroleum resources in the Appalachian basin: Chapter I.1 in Coal and petroleum resources in the Appalachian basin: distribution, geologic framework, and geochemical character

USGS Publications Warehouse

Trippi, Michael H.; Kinney, Scott A.; Gunther, Gregory; Ryder, Robert T.; Ruppert, Leslie F.; Ruppert, Leslie F.; Ryder, Robert T.

2014-01-01

Metadata for these datasets are available in HTML and XML formats. Metadata files contain information about the sources of data used to create the dataset, the creation process steps, the data quality, the geographic coordinate system and horizontal datum used for the dataset, the values of attributes used in the dataset table, information about the publication and the publishing organization, and other information that may be useful to the reader. All links in the metadata were valid at the time of compilation. Some of these links may no longer be valid. No attempt has been made to determine the new online location (if one exists) for the data.
Automatic publishing ISO 19115 metadata with PanMetaDocs using SensorML information

NASA Astrophysics Data System (ADS)

Stender, Vivien; Ulbricht, Damian; Schroeder, Matthias; Klump, Jens

2014-05-01

Terrestrial Environmental Observatories (TERENO) is an interdisciplinary and long-term research project spanning an Earth observation network across Germany. It includes four test sites within Germany from the North German lowlands to the Bavarian Alps and is operated by six research centers of the Helmholtz Association. The contribution by the participating research centers is organized as regional observatories. A challenge for TERENO and its observatories is to integrate all aspects of data management, data workflows, data modeling and visualizations into the design of a monitoring infrastructure. TERENO Northeast is one of the sub-observatories of TERENO and is operated by the German Research Centre for Geosciences (GFZ) in Potsdam. This observatory investigates geoecological processes in the northeastern lowland of Germany by collecting large amounts of environmentally relevant data. The success of long-term projects like TERENO depends on well-organized data management, data exchange between the partners involved and on the availability of the captured data. Data discovery and dissemination are facilitated not only through data portals of the regional TERENO observatories but also through a common spatial data infrastructure TEODOOR (TEreno Online Data repOsitORry). TEODOOR bundles the data, provided by the different web services of the single observatories, and provides tools for data discovery, visualization and data access. The TERENO Northeast data infrastructure integrates data from more than 200 instruments and makes data available through standard web services. Geographic sensor information and services are described using the ISO 19115 metadata schema. TEODOOR accesses the OGC Sensor Web Enablement (SWE) interfaces offered by the regional observatories. In addition to the SWE interface, TERENO Northeast also published data through DataCite. The necessary metadata are created in an automated process by extracting information from the SWE SensorML to create ISO 19115 compliant metadata. The resulting metadata file is stored in the GFZ Potsdam data infrastructure. The publishing workflow for file based research datasets at GFZ Potsdam is based on the eSciDoc infrastructure, using PanMetaDocs (PMD) as the graphical user interface. PMD is a collaborative, metadata based data and information exchange platform [1]. Besides SWE, metadata are also syndicated by PMD through an OAI-PMH interface. In addition, metadata from other observatories, projects or sensors in TERENO can be accessed through the TERENO Northeast data portal. [1] http://meetingorganizer.copernicus.org/EGU2012/EGU2012-7058-2.pdf
A Geospatial Semantic Enrichment and Query Service for Geotagged Photographs

PubMed Central

Ennis, Andrew; Nugent, Chris; Morrow, Philip; Chen, Liming; Ioannidis, George; Stan, Alexandru; Rachev, Preslav

2015-01-01

With the increasing abundance of technologies and smart devices, equipped with a multitude of sensors for sensing the environment around them, information creation and consumption has now become effortless. This, in particular, is the case for photographs with vast amounts being created and shared every day. For example, at the time of this writing, Instagram users upload 70 million photographs a day. Nevertheless, it still remains a challenge to discover the “right” information for the appropriate purpose. This paper describes an approach to create semantic geospatial metadata for photographs, which can facilitate photograph search and discovery. To achieve this we have developed and implemented a semantic geospatial data model by which a photograph can be enrich with geospatial metadata extracted from several geospatial data sources based on the raw low-level geo-metadata from a smartphone photograph. We present the details of our method and implementation for searching and querying the semantic geospatial metadata repository to enable a user or third party system to find the information they are looking for. PMID:26205265
A metadata schema for data objects in clinical research.

PubMed

Canham, Steve; Ohmann, Christian

2016-11-24

A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.

CHARMe Commentary metadata for Climate Science: collecting, linking and sharing user feedback on climate datasets

NASA Astrophysics Data System (ADS)

Blower, Jon; Lawrence, Bryan; Kershaw, Philip; Nagni, Maurizio

2014-05-01

The research process can be thought of as an iterative activity, initiated based on prior domain knowledge, as well on a number of external inputs, and producing a range of outputs including datasets, studies and peer reviewed publications. These outputs may describe the problem under study, the methodology used, the results obtained, etc. In any new publication, the author may cite or comment other papers or datasets in order to support their research hypothesis. However, as their work progresses, the researcher may draw from many other latent channels of information. These could include for example, a private conversation following a lecture or during a social dinner; an opinion expressed concerning some significant event such as an earthquake or for example a satellite failure. In addition, other sources of information of grey literature are important public such as informal papers such as the arxiv deposit, reports and studies. The climate science community is not an exception to this pattern; the CHARMe project, funded under the European FP7 framework, is developing an online system for collecting and sharing user feedback on climate datasets. This is to help users judge how suitable such climate data are for an intended application. The user feedback could be comments about assessments, citations, or provenance of the dataset, or other information such as descriptions of uncertainty or data quality. We define this as a distinct category of metadata called Commentary or C-metadata. We link C-metadata with target climate datasets using a Linked Data approach via the Open Annotation data model. In the context of Linked Data, C-metadata plays the role of a resource which, depending on its nature, may be accessed as simple text or as more structured content. The project is implementing a range of software tools to create, search or visualize C-metadata including a JavaScript plugin enabling this functionality to be integrated in situ with data provider portals. Since commentary metadata may originate from a range of sources, moderation of this information will become a crucial issue. If the project is successful, expert human moderation (analogous to peer-review) will become impracticable as annotation numbers increase, and some combination of algorithmic and crowd-sourced evaluation of commentary metadata will be necessary. To that end, future work will need to extend work under development to enable access control and checking of inputs, to deal with scale.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog

NASA Astrophysics Data System (ADS)

Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.

2016-10-01

Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A.; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M.; Kyrpides, Nikos C.

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11 472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond. PMID:22135293
The Genomes OnLine Database (GOLD) v.4: status of genomic and metagenomic projects and their associated metadata.

PubMed

Pagani, Ioanna; Liolios, Konstantinos; Jansson, Jakob; Chen, I-Min A; Smirnova, Tatyana; Nosrat, Bahador; Markowitz, Victor M; Kyrpides, Nikos C

2012-01-01

The Genomes OnLine Database (GOLD, http://www.genomesonline.org/) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2011, GOLD, now on version 4.0, contains information for 11,472 sequencing projects, of which 2907 have been completed and their sequence data has been deposited in a public repository. Out of these complete projects, 1918 are finished and 989 are permanent drafts. Moreover, GOLD contains information for 340 metagenome studies associated with 1927 metagenome samples. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about any (x) Sequence specification and beyond.
Applications of the LBA-ECO Metadata Warehouse

NASA Astrophysics Data System (ADS)

Wilcox, L.; Morrell, A.; Griffith, P. C.

2006-05-01

The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link to datasets with questionable file formats, and dynamically generated region/site coordinate quality assurance reports. These applications are as important as those that facilitate access to information because they help ensure a high standard of quality for the information. This presentation will discuss reports currently in use, provide a technical overview of the system, and discuss plans to extend this system to harvest metadata resulting from the North American Carbon Program by drawing on datasets in many different formats, residing in many thematic data centers and also distributed among hundreds of investigators.
National Geothermal Data System State Contributions by Data Type (Appendix A1-b)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Love, Diane

Multipaged spreadsheet listing an inventory of data submissions to the State contributions to the National Geothermal Data System project by services, by state, by metadata compilations, metadata, and map count, including a summary of information.
Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI)

PubMed Central

Amberg, Alexander; Barrett, Dave; Beale, Michael H.; Beger, Richard; Daykin, Clare A.; Fan, Teresa W.-M.; Fiehn, Oliver; Goodacre, Royston; Griffin, Julian L.; Hankemeier, Thomas; Hardy, Nigel; Harnly, James; Higashi, Richard; Kopka, Joachim; Lane, Andrew N.; Lindon, John C.; Marriott, Philip; Nicholls, Andrew W.; Reily, Michael D.; Thaden, John J.; Viant, Mark R.

2013-01-01

There is a general consensus that supports the need for standardized reporting of metadata or information describing large-scale metabolomics and other functional genomics data sets. Reporting of standard metadata provides a biological and empirical context for the data, facilitates experimental replication, and enables the re-interrogation and comparison of data by others. Accordingly, the Metabolomics Standards Initiative is building a general consensus concerning the minimum reporting standards for metabolomics experiments of which the Chemical Analysis Working Group (CAWG) is a member of this community effort. This article proposes the minimum reporting standards related to the chemical analysis aspects of metabolomics experiments including: sample preparation, experimental analysis, quality control, metabolite identification, and data pre-processing. These minimum standards currently focus mostly upon mass spectrometry and nuclear magnetic resonance spectroscopy due to the popularity of these techniques in metabolomics. However, additional input concerning other techniques is welcomed and can be provided via the CAWG on-line discussion forum at http://msi-workgroups.sourceforge.net/ or http://Msi-workgroups-feedback@lists.sourceforge.net. Further, community input related to this document can also be provided via this electronic forum. PMID:24039616
Antimicrobial Resistance Prediction in PATRIC and RAST.

PubMed

Davis, James J; Boisvert, Sébastien; Brettin, Thomas; Kenyon, Ronald W; Mao, Chunhong; Olson, Robert; Overbeek, Ross; Santerre, John; Shukla, Maulik; Wattam, Alice R; Will, Rebecca; Xia, Fangfang; Stevens, Rick

2016-06-14

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.
Seeking the Path to Metadata Nirvana

NASA Astrophysics Data System (ADS)

Graybeal, J.

2008-12-01

Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and time delays from the data sharing process, computers emphasize the contextual information that must be exchanged in order to exchange and reuse data. This requirement for contextual information has two faces: "interoperability" when talking about systems, and "the metadata problem" when talking about data. As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas, as you might imagine, there will always be complexities and incompatibilities that are not addressed, and data systems that are not interoperable, even within a science discipline. So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in the other direction. Computer systems let us work with more complexity, build community knowledge and collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations, science communities, and technologists see the importance interoperable systems and metadata, and direct resources toward them. With the new approaches and resources, projects like IPY and MMI can simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems. This presentation will outline the role metadata plays in durable interoperable data systems, for better or worse. It will describe times when "just choosing a standard" can work, and when it probably won't work. And it will point out signs that suggest a metadata storm is coming to your community project, and how you might avoid it. From these lessons we will seek a path to producing interoperable, interdisciplinary, metadata-enlightened environment observing systems.
The Global Streamflow Indices and Metadata archive (G-SIM): A compilation of global streamflow time series indices and meta-data

NASA Astrophysics Data System (ADS)

Do, Hong; Gudmundsson, Lukas; Leonard, Michael; Westra, Seth; Senerivatne, Sonia

2017-04-01

In-situ observations of daily streamflow with global coverage are a crucial asset for understanding large-scale freshwater resources which are an essential component of the Earth system and a prerequisite for societal development. Here we present the Global Streamflow Indices and Metadata archive (G-SIM), a collection indices derived from more than 20,000 daily streamflow time series across the globe. These indices are designed to support global assessments of change in wet and dry extremes, and have been compiled from 12 free-to-access online databases (seven national databases and five international collections). The G-SIM archive also includes significant metadata to help support detailed understanding of streamflow dynamics, with the inclusion of drainage area shapefile and many essential catchment properties such as land cover type, soil and topographic characteristics. The automated procedure in data handling and quality control of the project makes G-SIM a reproducible, extendible archive and can be utilised for many purposes in large-scale hydrology. Some potential applications include the identification of observational trends in hydrological extremes, the assessment of climate change impacts on streamflow regimes, and the validation of global hydrological models.
Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
Managing Data, Provenance and Chaos through Standardization and Automation at the Georgia Coastal Ecosystems LTER Site

NASA Astrophysics Data System (ADS)

Sheldon, W.

2013-12-01

Managing data for a large, multidisciplinary research program such as a Long Term Ecological Research (LTER) site is a significant challenge, but also presents unique opportunities for data stewardship. LTER research is conducted within multiple organizational frameworks (i.e. a specific LTER site as well as the broader LTER network), and addresses both specific goals defined in an NSF proposal as well as broader goals of the network; therefore, every LTER data can be linked to rich contextual information to guide interpretation and comparison. The challenge is how to link the data to this wealth of contextual metadata. At the Georgia Coastal Ecosystems LTER we developed an integrated information management system (GCE-IMS) to manage, archive and distribute data, metadata and other research products as well as manage project logistics, administration and governance (figure 1). This system allows us to store all project information in one place, and provide dynamic links through web applications and services to ensure content is always up to date on the web as well as in data set metadata. The database model supports tracking changes over time in personnel roles, projects and governance decisions, allowing these databases to serve as canonical sources of project history. Storing project information in a central database has also allowed us to standardize both the formatting and content of critical project information, including personnel names, roles, keywords, place names, attribute names, units, and instrumentation, providing consistency and improving data and metadata comparability. Lookup services for these standard terms also simplify data entry in web and database interfaces. We have also coupled the GCE-IMS to our MATLAB- and Python-based data processing tools (i.e. through database connections) to automate metadata generation and packaging of tabular and GIS data products for distribution. Data processing history is automatically tracked throughout the data lifecycle, from initial import through quality control, revision and integration by our data processing system (GCE Data Toolbox for MATLAB), and included in metadata for versioned data products. This high level of automation and system integration has proven very effective in managing the chaos and scalability of our information management program.
Development of an open metadata schema for prospective clinical research (openPCR) in China.

PubMed

Xu, W; Guan, Z; Sun, J; Wang, Z; Geng, Y

2014-01-01

In China, deployment of electronic data capture (EDC) and clinical data management system (CDMS) for clinical research (CR) is in its very early stage, and about 90% of clinical studies collected and submitted clinical data manually. This work aims to build an open metadata schema for Prospective Clinical Research (openPCR) in China based on openEHR archetypes, in order to help Chinese researchers easily create specific data entry templates for registration, study design and clinical data collection. Singapore Framework for Dublin Core Application Profiles (DCAP) is used to develop openPCR and four steps such as defining the core functional requirements and deducing the core metadata items, developing archetype models, defining metadata terms and creating archetype records, and finally developing implementation syntax are followed. The core functional requirements are divided into three categories: requirements for research registration, requirements for trial design, and requirements for case report form (CRF). 74 metadata items are identified and their Chinese authority names are created. The minimum metadata set of openPCR includes 3 documents, 6 sections, 26 top level data groups, 32 lower data groups and 74 data elements. The top level container in openPCR is composed of public document, internal document and clinical document archetypes. A hierarchical structure of openPCR is established according to Data Structure of Electronic Health Record Architecture and Data Standard of China (Chinese EHR Standard). Metadata attributes are grouped into six parts: identification, definition, representation, relation, usage guides, and administration. OpenPCR is an open metadata schema based on research registration standards, standards of the Clinical Data Interchange Standards Consortium (CDISC) and Chinese healthcare related standards, and is to be publicly available throughout China. It considers future integration of EHR and CR by adopting data structure and data terms in Chinese EHR Standard. Archetypes in openPCR are modularity models and can be separated, recombined, and reused. The authors recommend that the method to develop openPCR can be referenced by other countries when designing metadata schema of clinical research. In the next steps, openPCR should be used in a number of CR projects to test its applicability and to continuously improve its coverage. Besides, metadata schema for research protocol can be developed to structurize and standardize protocol, and syntactical interoperability of openPCR with other related standards can be considered.
Mercury Toolset for Spatiotemporal Metadata

NASA Technical Reports Server (NTRS)

Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

2010-01-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Mercury Toolset for Spatiotemporal Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

2010-06-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Critical Assessment of Small Molecule Identification 2016: automated methods.

PubMed

Schymanski, Emma L; Ruttkies, Christoph; Krauss, Martin; Brouard, Céline; Kind, Tobias; Dührkop, Kai; Allen, Felicity; Vaniya, Arpana; Verdegem, Dries; Böcker, Sebastian; Rousu, Juho; Shen, Huibin; Tsugawa, Hiroshi; Sajed, Tanvir; Fiehn, Oliver; Ghesquière, Bart; Neumann, Steffen

2017-03-27

The fourth round of the Critical Assessment of Small Molecule Identification (CASMI) Contest ( www.casmi-contest.org ) was held in 2016, with two new categories for automated methods. This article covers the 208 challenges in Categories 2 and 3, without and with metadata, from organization, participation, results and post-contest evaluation of CASMI 2016 through to perspectives for future contests and small molecule annotation/identification. The Input Output Kernel Regression (CSI:IOKR) machine learning approach performed best in "Category 2: Best Automatic Structural Identification-In Silico Fragmentation Only", won by Team Brouard with 41% challenge wins. The winner of "Category 3: Best Automatic Structural Identification-Full Information" was Team Kind (MS-FINDER), with 76% challenge wins. The best methods were able to achieve over 30% Top 1 ranks in Category 2, with all methods ranking the correct candidate in the Top 10 in around 50% of challenges. This success rate rose to 70% Top 1 ranks in Category 3, with candidates in the Top 10 in over 80% of the challenges. The machine learning and chemistry-based approaches are shown to perform in complementary ways. The improvement in (semi-)automated fragmentation methods for small molecule identification has been substantial. The achieved high rates of correct candidates in the Top 1 and Top 10, despite large candidate numbers, open up great possibilities for high-throughput annotation of untargeted analysis for "known unknowns". As more high quality training data becomes available, the improvements in machine learning methods will likely continue, but the alternative approaches still provide valuable complementary information. Improved integration of experimental context will also improve identification success further for "real life" annotations. The true "unknown unknowns" remain to be evaluated in future CASMI contests. Graphical abstract .
Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

2013-04-01

Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures REST-interface to store versioned dataset files and metadata in a XML-format. The software is able to administrate more than one eSciDoc metadata record per item and thus allows the description of a dataset according to its context. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and, at the same time, make use of contextual information available in a project setting. Access rights can be adjusted to set visibility of datasets to the required degree of openness. Metadata from separate instances of panMetaDocs can be syndicated to portals through RSS and OAI-PMH interfaces. The application architecture presented here allows storing file-based datasets and describe these datasets with any number of metadata schemas, depending on the intended use case. Data and metadata are stored in the same entity (eSciDoc items) and are managed by a software tool through the eSciDoc REST interface - in this case the application is panMetaDocs. Other software may re-use the produced items and modify the appropriate metadata records by accessing the web API of the eSciDoc data infrastructure. For presentation of the datasets in a web browser we are not bound to panMetaDocs. This is done by stylesheet transformation of the eSciDoc-item. [1] http://www.datacite.org [2] http://www.escidoc.org , eSciDoc, FIZ Karlruhe, Germany [3] http://panmetadocs.sf.net , panMetaDocs, GFZ Potsdam, Germany [4] http://metaworks.pangaea.de , panMetaWorks, Dr. R. Huber, MARUM, Univ. Bremen, Germany
From the inside-out: Retrospectives on a metadata improvement process to advance the discoverability of NASÁs earth science data

NASA Astrophysics Data System (ADS)

Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.

2017-12-01

Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been made compliant, although the process of improving metadata quality is ongoing and iterative. Further research is also warranted into whether or not the gains in metadata quality are also driving gains in data use.
Making Information Visible, Accessible, and Understandable: Meta-Data and Registries

DTIC Science & Technology

2007-07-01

the data created, the length of play time, album name, and the genre. Without resource metadata, portable digital music players would not be so...notion of a catalog card in a library. An example of metadata is the description of a music file specifying the creator, the artist that performed the song...describe struc- ture and formatting which are critical to interoperability and the management of databases. Going back to the portable music player example

Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

PubMed

Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

2016-09-15

Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
Defining linkages between the GSC and NSF's LTER program: How the Ecological Metadata Language (EML) relates to GCDML and other outcomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Inigo, Gil San; Servilla, Mark; Brunt, James

2008-06-01

The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata.more » EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer's tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER's experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.« less
Defining linkages between the GSC and NSF's LTER program: how the Ecological Metadata Language (EML) relates to GCDML and other outcomes.

PubMed

Gil, Inigo San; Sheldon, Wade; Schmidt, Tom; Servilla, Mark; Aguilar, Raul; Gries, Corinna; Gray, Tanya; Field, Dawn; Cole, James; Pan, Jerry Yun; Palanisamy, Giri; Henshaw, Donald; O'Brien, Margaret; Kinkel, Linda; McMahon, Katherine; Kottmann, Renzo; Amaral-Zettler, Linda; Hobbie, John; Goldstein, Philip; Guralnick, Robert P; Brunt, James; Michener, William K

2008-06-01

The Genomic Standards Consortium (GSC) invited a representative of the Long-Term Ecological Research (LTER) to its fifth workshop to present the Ecological Metadata Language (EML) metadata standard and its relationship to the Minimum Information about a Genome/Metagenome Sequence (MIGS/MIMS) and its implementation, the Genomic Contextual Data Markup Language (GCDML). The LTER is one of the top National Science Foundation (NSF) programs in biology since 1980, representing diverse ecosystems and creating long-term, interdisciplinary research, synthesis of information, and theory. The adoption of EML as the LTER network standard has been key to build network synthesis architectures based on high-quality standardized metadata. EML is the NSF-recognized metadata standard for LTER, and EML is a criteria used to review the LTER program progress. At the workshop, a potential crosswalk between the GCDML and EML was explored. Also, collaboration between the LTER and GSC developers was proposed to join efforts toward a common metadata cataloging designer's tool. The community adoption success of a metadata standard depends, among other factors, on the tools and trainings developed to use the standard. LTER's experience in embracing EML may help GSC to achieve similar success. A possible collaboration between LTER and GSC to provide training opportunities for GCDML and the associated tools is being explored. Finally, LTER is investigating EML enhancements to better accommodate genomics data, possibly integrating the GCDML schema into EML. All these action items have been accepted by the LTER contingent, and further collaboration between the GSC and LTER is expected.
Inheritance rules for Hierarchical Metadata Based on ISO 19115

NASA Astrophysics Data System (ADS)

Zabala, A.; Masó, J.; Pons, X.

2012-04-01

Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata registry is complete for each metadata hierarchical level, but at the implementation level most of the metadata elements are not stored at both levels but only at more generic one. This communication defines a metadata system that covers 4 levels, describes which metadata has to support series-layer inheritance and in which way, and how hierarchical levels are defined and stored. Metadata elements are classified according to the type of inheritance between products, series, tiles and the datasets. It explains the metadata elements classification and exemplifies it using core metadata elements. The communication also presents a metadata viewer and edition tool that uses the described model to propagate metadata elements and to show to the user a complete set of metadata for each level in a transparent way. This tool is integrated in the MiraMon GIS software.
Reliable and Persistent Identification of Linked Data Elements

NASA Astrophysics Data System (ADS)

Wood, David

Linked Data techniques rely upon common terminology in a manner similar to a relational database'vs reliance on a schema. Linked Data terminology anchors metadata descriptions and facilitates navigation of information. Common vocabularies ease the human, social tasks of understanding datasets sufficiently to construct queries and help to relate otherwise disparate datasets. Vocabulary terms must, when using the Resource Description Framework, be grounded in URIs. A current bestpractice on the World Wide Web is to serve vocabulary terms as Uniform Resource Locators (URLs) and present both human-readable and machine-readable representations to the public. Linked Data terminology published to theWorldWideWeb may be used by others without reference or notification to the publishing party. That presents a problem: Vocabulary publishers take on an implicit responsibility to maintain and publish their terms via the URLs originally assigned, regardless of the inconvenience such a responsibility may cause. Over the course of years, people change jobs, publishing organizations change Internet domain names, computers change IP addresses,systems administrators publish old material in new ways. Clearly, a mechanism is required to manageWeb-based vocabularies over a long term. This chapter places Linked Data vocabularies in context with the wider concepts of metadata in general and specifically metadata on the Web. Persistent identifier mechanisms are reviewed, with a particular emphasis on Persistent URLs, or PURLs. PURLs and PURL services are discussed in the context of Linked Data. Finally, historic weaknesses of PURLs are resolved by the introduction of a federation of PURL services to address needs specific to Linked Data.
GEO Label Web Services for Dynamic and Effective Communication of Geospatial Metadata Quality

NASA Astrophysics Data System (ADS)

Lush, Victoria; Nüst, Daniel; Bastin, Lucy; Masó, Joan; Lumsden, Jo

2014-05-01

We present demonstrations of the GEO label Web services and their integration into a prototype extension of the GEOSS portal (http://scgeoviqua.sapienzaconsulting.com/web/guest/geo_home), the GMU portal (http://gis.csiss.gmu.edu/GADMFS/) and a GeoNetwork catalog application (http://uncertdata.aston.ac.uk:8080/geonetwork/srv/eng/main.home). The GEO label is designed to communicate, and facilitate interrogation of, geospatial quality information with a view to supporting efficient and effective dataset selection on the basis of quality, trustworthiness and fitness for use. The GEO label which we propose was developed and evaluated according to a user-centred design (UCD) approach in order to maximise the likelihood of user acceptance once deployed. The resulting label is dynamically generated from producer metadata in ISO or FDGC format, and incorporates user feedback on dataset usage, ratings and discovered issues, in order to supply a highly informative summary of metadata completeness and quality. The label was easily incorporated into a community portal as part of the GEO Architecture Implementation Programme (AIP-6) and has been successfully integrated into a prototype extension of the GEOSS portal, as well as the popular metadata catalog and editor, GeoNetwork. The design of the GEO label was based on 4 user studies conducted to: (1) elicit initial user requirements; (2) investigate initial user views on the concept of a GEO label and its potential role; (3) evaluate prototype label visualizations; and (4) evaluate and validate physical GEO label prototypes. The results of these studies indicated that users and producers support the concept of a label with drill-down interrogation facility, combining eight geospatial data informational aspects, namely: producer profile, producer comments, lineage information, standards compliance, quality information, user feedback, expert reviews, and citations information. These are delivered as eight facets of a wheel-like label, which are coloured according to metadata availability and are clickable to allow a user to engage with the original metadata and explore specific aspects in more detail. To support this graphical representation and allow for wider deployment architectures we have implemented two Web services, a PHP and a Java implementation, that generate GEO label representations by combining producer metadata (from standard catalogues or other published locations) with structured user feedback. Both services accept encoded URLs of publicly available metadata documents or metadata XML files as HTTP POST and GET requests and apply XPath and XSLT mappings to transform producer and feedback XML documents into clickable SVG GEO label representations. The label and services are underpinned by two XML-based quality models. The first is a producer model that extends ISO 19115 and 19157 to allow fuller citation of reference data, presentation of pixel- and dataset- level statistical quality information, and encoding of 'traceability' information on the lineage of an actual quality assessment. The second is a user quality model (realised as a feedback server and client) which allows reporting and query of ratings, usage reports, citations, comments and other domain knowledge. Both services are Open Source and are available on GitHub at https://github.com/lushv/geolabel-service and https://github.com/52North/GEO-label-java. The functionality of these services can be tested using our GEO label generation demos, available online at http://www.geolabel.net/demo.html and http://geoviqua.dev.52north.org/glbservice/index.jsf.
The Value of Data and Metadata Standardization for Interoperability in Giovanni Or: Why Your Product's Metadata Causes Us Headaches!

NASA Technical Reports Server (NTRS)

Smit, Christine; Hegde, Mahabaleshwara; Strub, Richard; Bryant, Keith; Li, Angela; Petrenko, Maksym

2017-01-01

Giovanni is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization.
Environmental Information Management For Data Discovery and Access System

NASA Astrophysics Data System (ADS)

Giriprakash, P.

2011-01-01

Mercury is a federated metadata harvesting, search and retrieval tool based on both open source software and software developed at Oak Ridge National Laboratory. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. A major new version of Mercury was developed during 2007 and released in early 2008. This new version provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, support for RSS delivery of search results, and ready customization to meet the needs of the multiple projects which use Mercury. For the end users, Mercury provides a single portal to very quickly search for data and information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow ! the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data.
A statistical metadata model for clinical trials' data management.

PubMed

Vardaki, Maria; Papageorgiou, Haralambos; Pentaris, Fragkiskos

2009-08-01

We introduce a statistical, process-oriented metadata model to describe the process of medical research data collection, management, results analysis and dissemination. Our approach explicitly provides a structure for pieces of information used in Clinical Study Data Management Systems, enabling a more active role for any associated metadata. Using the object-oriented paradigm, we describe the classes of our model that participate during the design of a clinical trial and the subsequent collection and management of the relevant data. The advantage of our approach is that we focus on presenting the structural inter-relation of these classes when used during datasets manipulation by proposing certain transformations that model the simultaneous processing of both data and metadata. Our solution reduces the possibility of human errors and allows for the tracking of all changes made during datasets lifecycle. The explicit modeling of processing steps improves data quality and assists in the problem of handling data collected in different clinical trials. The case study illustrates the applicability of the proposed framework demonstrating conceptually the simultaneous handling of datasets collected during two randomized clinical studies. Finally, we provide the main considerations for implementing the proposed framework into a modern Metadata-enabled Information System.
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models.

PubMed

Misra, Dharitri; Chen, Siyuan; Thoma, George R

2009-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
The Metadata Cloud: The Last Piece of a Distributed Data System Model

NASA Astrophysics Data System (ADS)

King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.

2012-12-01

Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Separation of metadata and pixel data to speed DICOM tag morphing.

PubMed

Ismail, Mahmoud; Philbin, James

2013-01-01

The DICOM information model combines pixel data and metadata in single DICOM object. It is not possible to access the metadata separately from the pixel data. There are use cases where only metadata is accessed. The current DICOM object format increases the running time of those use cases. Tag morphing is one of those use cases. Tag morphing includes deletion, insertion or manipulation of one or more of the metadata attributes. It is typically used for order reconciliation on study acquisition or to localize the issuer of patient ID (IPID) and the patient ID attributes when data from one domain is transferred to a different domain. In this work, we propose using Multi-Series DICOM (MSD) objects, which separate metadata from pixel data and remove duplicate attributes, to reduce the time required for Tag Morphing. The time required to update a set of study attributes in each format is compared. The results show that the MSD format significantly reduces the time required for tag morphing.
Collaborative Sharing of Multidimensional Space-time Data Using HydroShare

NASA Astrophysics Data System (ADS)

Gan, T.; Tarboton, D. G.; Horsburgh, J. S.; Dash, P. K.; Idaszak, R.; Yi, H.; Blanton, B.

2015-12-01

HydroShare is a collaborative environment being developed for sharing hydrological data and models. It includes capability to upload data in many formats as resources that can be shared. The HydroShare data model for resources uses a specific format for the representation of each type of data and specifies metadata common to all resource types as well as metadata unique to specific resource types. The Network Common Data Form (NetCDF) was chosen as the format for multidimensional space-time data in HydroShare. NetCDF is widely used in hydrological and other geoscience modeling because it contains self-describing metadata and supports the creation of array-oriented datasets that may include three spatial dimensions, a time dimension and other user defined dimensions. For example, NetCDF may be used to represent precipitation or surface air temperature fields that have two dimensions in space and one dimension in time. This presentation will illustrate how NetCDF files are used in HydroShare. When a NetCDF file is loaded into HydroShare, header information is extracted using the "ncdump" utility. Python functions developed for the Django web framework on which HydroShare is based, extract science metadata present in the NetCDF file, saving the user from having to enter it. Where the file follows Climate Forecast (CF) convention and Attribute Convention for Dataset Discovery (ACDD) standards, metadata is thus automatically populated. Users also have the ability to add metadata to the resource that may not have been present in the original NetCDF file. HydroShare's metadata editing functionality then writes this science metadata back into the NetCDF file to maintain consistency between the science metadata in HydroShare and the metadata in the NetCDF file. This further helps researchers easily add metadata information following the CF and ACDD conventions. Additional data inspection and subsetting functions were developed, taking advantage of Python and command line libraries for working with NetCDF files. We describe the design and implementation of these features and illustrate how NetCDF files from a modeling application may be curated in HydroShare and thus enhance reproducibility of the associated research. We also discuss future development planned for multidimensional space-time data in HydroShare.
DICOM Standard Conformance in Veterinary Medicine in Germany: a Survey of Imaging Studies in Referral Cases.

PubMed

Brühschwein, Andreas; Klever, Julius; Wilkinson, Tom; Meyer-Lindenberg, Andrea

2018-02-01

In 2016, the recommendations of the DICOM Standards Committee for the use of veterinary identification DICOM tags had its 10th anniversary. The goal of our study was to survey veterinary DICOM standard conformance in Germany regarding the specific identification tags veterinarians should use in veterinary diagnostic imaging. We hypothesized that most veterinarians in Germany do not follow the guidelines of the DICOM Standards Committee. We analyzed the metadata of 488 imaging studies of referral cases from 115 different veterinary institutions in Germany by computer-aided DICOM header readout. We found that 25 (5.1%) of the imaging studies fully complied with the "veterinary DICOM standard" in this survey. The results confirmed our hypothesis that the recommendations of the DICOM Standards Committee for the consistent and advantageous use of veterinary identification tags have found minimal acceptance amongst German veterinarians. DICOM does not only enable connectivity between machines, DICOM also improves communication between veterinarians by sharing correct and valuable metadata for better patient care. Therefore, we recommend that lecturers, universities, societies, authorities, vendors, and other stakeholders should increase their effort to improve the spread of the veterinary DICOM standard in the veterinary world.
Metadata and network API aspects of a framework for storing and retrieving civil infrastructure monitoring data

NASA Astrophysics Data System (ADS)

Wong, John-Michael; Stojadinovic, Bozidar

2005-05-01

A framework has been defined for storing and retrieving civil infrastructure monitoring data over a network. The framework consists of two primary components: metadata and network communications. The metadata component provides the descriptions and data definitions necessary for cataloging and searching monitoring data. The communications component provides Java classes for remotely accessing the data. Packages of Enterprise JavaBeans and data handling utility classes are written to use the underlying metadata information to build real-time monitoring applications. The utility of the framework was evaluated using wireless accelerometers on a shaking table earthquake simulation test of a reinforced concrete bridge column. The NEESgrid data and metadata repository services were used as a backend storage implementation. A web interface was created to demonstrate the utility of the data model and provides an example health monitoring application.
PhysDoc: A Distributed Network of Physics Institutions: Collecting, Indexing, and Searching High Quality Documents by Using Harvest; The Dublin Core Metadata Initiative: Mission, Current Activities, and Future Directions; Information Services for Higher Education: A New Competitive Space; Intellectual Property Conservancies.

ERIC Educational Resources Information Center

Severiens, Thomas; Hohlfeld, Michael; Zimmermann, Kerstin; Hilf, Eberhard R.; von Ossietzky, Carl; Weibel, Stuart L.; Koch, Traugott; Hughes, Carol Ann; Bearman, David

2000-01-01

Includes four articles that discuss a variety to topics, including a distributed network of physics institutions documents called PhysDocs which harvests information from the local Web-servers of professional physics institutions; the Dublin Core metadata initiative; information services for higher education in a competitive environment; and…
The EPOS e-Infrastructure

NASA Astrophysics Data System (ADS)

Jeffery, Keith; Bailo, Daniele

2014-05-01

The European Plate Observing System (EPOS) is integrating geoscientific information concerning earth movements in Europe. We are approaching the end of the PP (Preparatory Project) phase and in October 2014 expect to continue with the full project within ESFRI (European Strategic Framework for Research Infrastructures). The key aspects of EPOS concern providing services to allow homogeneous access by end-users over heterogeneous data, software, facilities, equipment and services. The e-infrastructure of EPOS is the heart of the project since it integrates the work on organisational, legal, economic and scientific aspects. Following the creation of an inventory of relevant organisations, persons, facilities, equipment, services, datasets and software (RIDE) the scale of integration required became apparent. The EPOS e-infrastructure architecture has been developed systematically based on recorded primary (user) requirements and secondary (interoperation with other systems) requirements through Strawman, Woodman and Ironman phases with the specification - and developed confirmatory prototypes - becoming more precise and progressively moving from paper to implemented system. The EPOS architecture is based on global core services (Integrated Core Services - ICS) which access thematic nodes (domain-specific European-wide collections, called thematic Core Services - TCS), national nodes and specific institutional nodes. The key aspect is the metadata catalog. In one dimension this is described in 3 levels: (1) discovery metadata using well-known and commonly used standards such as DC (Dublin Core) to enable users (via an intelligent user interface) to search for objects within the EPOS environment relevant to their needs; (2) contextual metadata providing the context of the object described in the catalog to enable a user or the system to determine the relevance of the discovered object(s) to their requirement - the context includes projects, funding, organisations involved, persons involved, related publications, facilities, equipment and others, and utilises CERIF (Common European Research Information Format) standard (see www.eurocris.org); (3) detailed metadata which is specific to a domain or to a particular object and includes the schema describing the object to processing software. The other dimension of the metadata concerns the objects described. These are classified into users, services (including software), data and resources (computing, data storage, instruments and scientific equipment). An alternative architecture has been considered: using brokering. This technique has been used especially in North America geoscience projects to interoperate datasets. The technique involves writing software to interconvert between any two node datasets. Given n nodes this implies writing n*(n-1) convertors. EPOS Working Group 7 (e-infrastructures and virtual community) which deals with the design and implementation of a prototype of the EPOS services, chose to use an approach which endows the system with an extreme flexibility and sustainability. It is called the Metadata Catalogue approach. With the use of the catalogue the EPOS system can: 1. interoperate with software, services, users, organisations, facilities, equipment etc. as well as datasets; 2. avoid to write n*(n-1) software convertors and generate as much as possible, through the information contained in the catalogue only n convertors. This is a huge saving - especially in maintenance as the datasets (or other node resources) evolve. We are working on (semi-) automation of convertor generation by metadata mapping - this is leading-edge computer science research; 3. make large use of contextual metadata which enable a user or a machine to: (i) improve discovery of resources at nodes; (ii) improve precision and recall in search; (iii) drive the systems for identification, authentication, authorisation, security and privacy recording the relevant attributes of the node resources and of the user; (iv) manage provenance and long-term digital preservation; The linkage between the Integrated Services, which provide the integration of data and services, with the diverse Thematic Services Nodes is provided by means of a compatibility layer, which includes the aforementioned metadata catalogue. This layer provides 'connectors' to make local data, software and services available through the EPOS Integrated Services layer. In conclusion, we believe the EPOS e-infrastructure architecture is fit for purpose including long-term sustainability and pan-European access to data and services.
IGSN e.V.: Registration and Identification Services for Physical Samples in the Digital Universe

NASA Astrophysics Data System (ADS)

Lehnert, K. A.; Klump, J.; Arko, R. A.; Bristol, S.; Buczkowski, B.; Chan, C.; Chan, S.; Conze, R.; Cox, S. J.; Habermann, T.; Hangsterfer, A.; Hsu, L.; Milan, A.; Miller, S. P.; Noren, A. J.; Richard, S. M.; Valentine, D. W.; Whitenack, T.; Wyborn, L. A.; Zaslavsky, I.

2011-12-01

The International Geo Sample Number (IGSN) is a unique identifier for samples and specimens collected from our natural environment. It was developed by the System for Earth Sample Registration SESAR to overcome the problem of ambiguous naming of samples that has limited the ability to share, link, and integrate data for samples across Geoscience data systems. Over the past 5 years, SESAR has made substantial progress in implementing the IGSN for sample and data management, working with Geoscience researchers, Geoinformatics specialists, and sample curators to establish metadata requirements, registration procedures, and best practices for the use of the IGSN. The IGSN is now recognized as the primary solution for sample identification and registration, and supported by a growing user community of investigators, repositories, science programs, and data systems. In order to advance broad disciplinary and international implementation of the IGSN, SESAR organized a meeting of international leaders in Geoscience informatics in 2011 to develop a consensus strategy for the long-term operations of the registry with approaches for sustainable operation, organizational structure, governance, and funding. The group endorsed an internationally unified approach for registration and discovery of physical specimens in the Geosciences, and refined the existing SESAR architecture to become a modular and scalable approach, separating the IGSN Registry from a central Sample Metadata Clearinghouse (SESAR), and introducing 'Local Registration Agents' that provide registration services to specific disciplinary or organizational communities, with tools for metadata submission and management, and metadata archiving. Development and implementation of the new IGSN architecture is underway with funding provided by the US NSF Office of International Science and Engineering. A formal governance structure is being established for the IGSN model, consisting of (a) an international not-for-profit organization, the IGSN e.V. (e.V. = 'Eingetragener Verein', legal status for a registered voluntary association in Germany), that defines the IGSN scope and syntax and maintains the IGSN Handle system, and (b) a Science Advisory Board that guides policies, technology, and best practices of the SESAR Sample Metadata Clearinghouse and Local Registration Agents. The IGSN e.V. is being incorporated in Germany at the GFZ Potsdam, a founding event is planned for the AGU Fall Meeting.
ALE: automated label extraction from GEO metadata.

PubMed

Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D

2017-12-28

NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.
NASA's Earth Observing Data and Information System - Supporting Interoperability through a Scalable Architecture (Invited)

NASA Astrophysics Data System (ADS)

Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.

2011-12-01

Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.

NASA's Earth Observing Data and Information System - Supporting Interoperability through a Scalable Architecture (Invited)

NASA Astrophysics Data System (ADS)

Mitchell, A. E.; Lowe, D. R.; Murphy, K. J.; Ramapriyan, H. K.

2013-12-01

Initiated in 1990, NASA's Earth Observing System Data and Information System (EOSDIS) is currently a petabyte-scale archive of data designed to receive, process, distribute and archive several terabytes of science data per day from NASA's Earth science missions. Comprised of 12 discipline specific data centers collocated with centers of science discipline expertise, EOSDIS manages over 6800 data products from many science disciplines and sources. NASA supports global climate change research by providing scalable open application layers to the EOSDIS distributed information framework. This allows many other value-added services to access NASA's vast Earth Science Collection and allows EOSDIS to interoperate with data archives from other domestic and international organizations. EOSDIS is committed to NASA's Data Policy of full and open sharing of Earth science data. As metadata is used in all aspects of NASA's Earth science data lifecycle, EOSDIS provides a spatial and temporal metadata registry and order broker called the EOS Clearing House (ECHO) that allows efficient search and access of cross domain data and services through the Reverb Client and Application Programmer Interfaces (APIs). Another core metadata component of EOSDIS is NASA's Global Change Master Directory (GCMD) which represents more than 25,000 Earth science data set and service descriptions from all over the world, covering subject areas within the Earth and environmental sciences. With inputs from the ECHO, GCMD and Soil Moisture Active Passive (SMAP) mission metadata models, EOSDIS is developing a NASA ISO 19115 Best Practices Convention. Adoption of an international metadata standard enables a far greater level of interoperability among national and international data products. NASA recently concluded a 'Metadata Harmony Study' of EOSDIS metadata capabilities/processes of ECHO and NASA's Global Change Master Directory (GCMD), to evaluate opportunities for improved data access and use, reduce efforts by data providers and improve metadata integrity. The result was a recommendation for EOSDIS to develop a 'Common Metadata Repository (CMR)' to manage the evolution of NASA Earth Science metadata in a unified and consistent way by providing a central storage and access capability that streamlines current workflows while increasing overall data quality and anticipating future capabilities. For applications users interested in monitoring and analyzing a wide variety of natural and man-made phenomena, EOSDIS provides access to near real-time products from the MODIS, OMI, AIRS, and MLS instruments in less than 3 hours from observation. To enable interactive exploration of NASA's Earth imagery, EOSDIS is developing a set of standard services to deliver global, full-resolution satellite imagery in a highly responsive manner. EOSDIS is also playing a lead role in the development of the CEOS WGISS Integrated Catalog (CWIC), which provides search and access to holdings of participating international data providers. EOSDIS provides a platform to expose and share information on NASA Earth science tools and data via Earthdata.nasa.gov while offering a coherent and interoperable system for the NASA Earth Science Data System (ESDS) Program.
Long-term Science Data Curation Using a Digital Object Model and Open-Source Frameworks

NASA Astrophysics Data System (ADS)

Pan, J.; Lenhardt, W.; Wilson, B. E.; Palanisamy, G.; Cook, R. B.

2010-12-01

Scientific digital content, including Earth Science observations and model output, has become more heterogeneous in format and more distributed across the Internet. In addition, data and metadata are becoming necessarily linked internally and externally on the Web. As a result, such content has become more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content consistently. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all. We investigate the use of several open-source software frameworks to address these challenges. We use Fedora Commons Framework and its digital object abstraction as the repository, Drupal CMS as the user-interface, and the Islandora module as the connector from Drupal to Fedora Repository. With the digital object model, metadata of data description and data provenance can be associated with data content in a formal manner, so are external references and other arbitrary auxiliary information. Changes are formally audited on an object, and digital contents are versioned and have checksums automatically computed. Further, relationships among objects are formally expressed with RDF triples. Data replication, recovery, metadata export are supported with standard protocols, such as OAI-PMH. We provide a tentative comparative analysis of the chosen software stack with the Open Archival Information System (OAIS) reference model, along with our initial results with the existing terrestrial ecology data collections at NASA’s ORNL Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC).
Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

NASA Astrophysics Data System (ADS)

Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

2016-12-01

Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally deployed DataONE quality service can achieve major efficiency gains by allowing member repositories to customize and use recommendations that fit their specific needs without having to create de novo infrastructure at their site.
Implementation of a metadata architecture and knowledge collection to support semantic interoperability in an enterprise data warehouse.

PubMed

Dhaval, Rakesh; Borlawsky, Tara; Ostrander, Michael; Santangelo, Jennifer; Kamal, Jyoti; Payne, Philip R O

2008-11-06

In order to enhance interoperability between enterprise systems, and improve data validity and reliability throughout The Ohio State University Medical Center (OSUMC), we have initiated the development of an ontology-anchored metadata architecture and knowledge collection for our enterprise data warehouse. The metadata and corresponding semantic relationships stored in the OSUMC knowledge collection are intended to promote consistency and interoperability across the heterogeneous clinical, research, business and education information managed within the data warehouse.
OAI and NASA's Scientific and Technical Information.

ERIC Educational Resources Information Center

Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.

2003-01-01

Details NASA's (National Aeronautics & Space Administration (USA)) involvement in defining and testing the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH) and experience with adapting existing NASA distributed searching DLs (digital libraries) to use the OAI-PMH and metadata harvesting. Discusses some new digital…
PIMMS tools for capturing metadata about simulations

NASA Astrophysics Data System (ADS)

Pascoe, Charlotte; Devine, Gerard; Tourte, Gregory; Pascoe, Stephen; Lawrence, Bryan; Barjat, Hannah

2013-04-01

PIMMS (Portable Infrastructure for the Metafor Metadata System) provides a method for consistent and comprehensive documentation of modelling activities that enables the sharing of simulation data and model configuration information. The aim of PIMMS is to package the metadata infrastructure developed by Metafor for CMIP5 so that it can be used by climate modelling groups in UK Universities. PIMMS tools capture information about simulations from the design of experiments to the implementation of experiments via simulations that run models. PIMMS uses the Metafor methodology which consists of a Common Information Model (CIM), Controlled Vocabularies (CV) and software tools. PIMMS software tools provide for the creation and consumption of CIM content via a web services infrastructure and portal developed by the ES-DOC community. PIMMS metadata integrates with the ESGF data infrastructure via the mapping of vocabularies onto ESGF facets. There are three paradigms of PIMMS metadata collection: Model Intercomparision Projects (MIPs) where a standard set of questions is asked of all models which perform standard sets of experiments. Disciplinary level metadata collection where a standard set of questions is asked of all models but experiments are specified by users. Bespoke metadata creation where the users define questions about both models and experiments. Examples will be shown of how PIMMS has been configured to suit each of these three paradigms. In each case PIMMS allows users to provide additional metadata beyond that which is asked for in an initial deployment. The primary target for PIMMS is the UK climate modelling community where it is common practice to reuse model configurations from other researchers. This culture of collaboration exists in part because climate models are very complex with many variables that can be modified. Therefore it has become common practice to begin a series of experiments by using another climate model configuration as a starting point. Usually this other configuration is provided by a researcher in the same research group or by a previous collaborator with whom there is an existing scientific relationship. Some efforts have been made at the university department level to create documentation but there is a wide diversity in the scope and purpose of this information. The consistent and comprehensive documentation enabled by PIMMS will enable the wider sharing of climate model data and configuration information. The PIMMS methodology assumes an initial effort to document standard model configurations. Once these descriptions have been created users need only describe the specific way in which their model configuration is different from the standard. Thus the documentation burden on the user is specific to the experiment they are performing and fits easily into the workflow of doing their science. PIMMS metadata is independent of data and as such is ideally suited for documenting model development. PIMMS provides a framework for sharing information about failed model configurations for which data are not kept, the negative results that don't appear in scientific literature. PIMMS is a UK project funded by JISC, The University of Reading, The University of Bristol and STFC.
The Modeling and Simulation Catalog for Discovery, Knowledge and Reuse

NASA Technical Reports Server (NTRS)

Stone, George F. III; Greenberg, Brandi; Daehler-Wilking, Richard; Hunt, Steven

2011-01-01

The DoD M&S Steering Committee has noted that the current DoD and Service's modeling and simulation resource repository (MSRR) services are not up-to-date limiting their value to the using communities. However, M&S leaders and managers also determined that the Department needs a functional M&S registry card catalog to facilitate M&S tool and data visibility to support M&S activities across the DoD. The M&S Catalog will discover and access M&S metadata maintained at nodes distributed across DoD networks in a centrally managed, decentralized process that employs metadata collection and management. The intent is to link information stores, precluding redundant location updating. The M&S Catalog uses a standard metadata schemas based on the DoD's Net-Centric Data Strategy Community of Interest metadata specification. The Air Force, Navy and OSD (CAPE) have provided initial information to participating DoD nodes, but plans on the horizon are being made to bring in hundreds of source providers.
Constructing compact and effective graphs for recommender systems via node and edge aggregations

DOE PAGES

Lee, Sangkeun; Kahng, Minsuk; Lee, Sang-goo

2014-12-10

Exploiting graphs for recommender systems has great potential to flexibly incorporate heterogeneous information for producing better recommendation results. As our baseline approach, we first introduce a naive graph-based recommendation method, which operates with a heterogeneous log-metadata graph constructed from user log and content metadata databases. Although the na ve graph-based recommendation method is simple, it allows us to take advantages of heterogeneous information and shows promising flexibility and recommendation accuracy. However, it often leads to extensive processing time due to the sheer size of the graphs constructed from entire user log and content metadata databases. In this paper, we proposemore » node and edge aggregation approaches to constructing compact and e ective graphs called Factor-Item bipartite graphs by aggregating nodes and edges of a log-metadata graph. Furthermore, experimental results using real world datasets indicate that our approach can significantly reduce the size of graphs exploited for recommender systems without sacrificing the recommendation quality.« less
Sensor-based architecture for medical imaging workflow analysis.

PubMed

Silva, Luís A Bastião; Campos, Samuel; Costa, Carlos; Oliveira, José Luis

2014-08-01

The growing use of computer systems in medical institutions has been generating a tremendous quantity of data. While these data have a critical role in assisting physicians in the clinical practice, the information that can be extracted goes far beyond this utilization. This article proposes a platform capable of assembling multiple data sources within a medical imaging laboratory, through a network of intelligent sensors. The proposed integration framework follows a SOA hybrid architecture based on an information sensor network, capable of collecting information from several sources in medical imaging laboratories. Currently, the system supports three types of sensors: DICOM repository meta-data, network workflows and examination reports. Each sensor is responsible for converting unstructured information from data sources into a common format that will then be semantically indexed in the framework engine. The platform was deployed in the Cardiology department of a central hospital, allowing identification of processes' characteristics and users' behaviours that were unknown before the utilization of this solution.
EPOS Data and Service Provision

NASA Astrophysics Data System (ADS)

Bailo, Daniele; Jeffery, Keith G.; Atakan, Kuvvet; Harrison, Matt

2017-04-01

EPOS is now in IP (implementation phase) after a successful PP (preparatory phase). EPOS consists of essentially two components, one ICS (Integrated Core Services) representing the integrating ICT (Information and Communication Technology) and many TCS (Thematic Core Services) representing the scientific domains. The architecture developed, demonstrated and agreed within the project during the PP is now being developed utilising co-design with the TCS teams and agile, spiral methods within the ICS team. The 'heart' of EPOS is the metadata catalog. This provides for the ICS a digital representation of the TCS assets (services, data, software, equipment, expertise…) thus facilitating access, interoperation and (re-)use. A major part of the work has been interactions with the TCS. The original intention to harvest information from the TCS required (and still requires) discussions to understand fully the TCS organisational structures linked with rights, security and privacy; their (meta)data syntax (structure) and semantics (meaning); their workflows and methods of working and the services offered. To complicate matters further the TCS are each at varying stages of development and the ICS design has to accommodate pre-existing, developing and expected future standards for metadata, data, software and processes. Through information documents, questionnaires and interviews/meetings the EPOS ICS team has collected DDSS (Data, Data Products, Software and Services) information from the TCS. The ICS team developed a simplified metadata model for presentation to the TCS and the ICS team will perform the mapping and conversion from this model to the internal detailed technical metadata model using (CERIF: a EU recommendation to Member States maintained, developed and promoted by euroCRIS www.eurocris.org ). At the time of writing the final modifications of the EPOS metadata model are being made, and the mappings to CERIF designed, prior to the main phase of (meta)data collection into the EPOS metadata catalog. In parallel work proceeds on the user interface softsare, the APIs (Application Programming Interfaces) to the TCS services, the harvesting method and software, the AAAI (Authentication, Authorisation, Accounting Infrastructure) and the system manager. The next steps will involve interfaces to ICS-D (Distributed ICS i.e. facilities and services for computing, data storage, detectors and instruments for data collection etc.) to which requests, software and data will be deployed and from which data will be generated. Associated with this will be the development of the workflow system which will assist the end-user in building a workflow to achieve the scientific objectives.
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness.

PubMed

Liolios, Konstantinos; Schriml, Lynn; Hirschman, Lynette; Pagani, Ioanna; Nosrat, Bahador; Sterk, Peter; White, Owen; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Taylor, Chris; Kyrpides, Nikos C; Field, Dawn

2012-07-30

Variability in the extent of the descriptions of data ('metadata') held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
Development of DKB ETL module in case of data conversion

NASA Astrophysics Data System (ADS)

Kaida, A. Y.; Golosova, M. V.; Grigorieva, M. A.; Gubin, M. Y.

2018-05-01

Modern scientific experiments involve the producing of huge volumes of data that requires new approaches in data processing and storage. These data themselves, as well as their processing and storage, are accompanied by a valuable amount of additional information, called metadata, distributed over multiple informational systems and repositories, and having a complicated, heterogeneous structure. Gathering these metadata for experiments in the field of high energy nuclear physics (HENP) is a complex issue, requiring the quest for solutions outside the box. One of the tasks is to integrate metadata from different repositories into some kind of a central storage. During the integration process, metadata taken from original source repositories go through several processing steps: metadata aggregation, transformation according to the current data model and loading it to the general storage in a standardized form. The R&D project of ATLAS experiment on LHC, Data Knowledge Base, is aimed to provide fast and easy access to significant information about LHC experiments for the scientific community. The data integration subsystem, being developed for the DKB project, can be represented as a number of particular pipelines, arranging data flow from data sources to the main DKB storage. The data transformation process, represented by a single pipeline, can be considered as a number of successive data transformation steps, where each step is implemented as an individual program module. This article outlines the specifics of program modules, used in the dataflow, and describes one of the modules developed and integrated into the data integration subsystem of DKB.
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models

PubMed Central

Misra, Dharitri; Chen, Siyuan; Thoma, George R.

2010-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
Integration of external metadata into the Earth System Grid Federation (ESGF)

NASA Astrophysics Data System (ADS)

Berger, Katharina; Levavasseur, Guillaume; Stockhause, Martina; Lautenschlager, Michael

2015-04-01

International projects with high volume data usually disseminate their data in a federated data infrastructure, e.g.~the Earth System Grid Federation (ESGF). The ESGF aims to make the geographically distributed data seamlessly discoverable and accessible. Additional data-related information is currently collected and stored in separate repositories by each data provider. This scattered and useful information is not or only partly available for ESGF users. Examples for such additional information systems are ES-DOC/metafor for model and simulation information, IPSL's versioning information, CHARMe for user annotations, DKRZ's quality information and data citation information. The ESGF Quality Control working team (esgf-qcwt) aims to integrate these valuable pieces of additional information into the ESGF in order to make them available to users and data archive managers by (i) integrating external information into ESGF portal, (ii) integrating links to external information objects into the ESGF metadata index, e.g. by the use of PIDs (Persistent IDentifiers), and (iii) automating the collection of external information during the ESGF data publication process. For the sixth phase of CMIP (Coupled Model Intercomparison Project), the ESGF metadata index is to be enriched by additional information on data citation, file version, etc. This information will support users directly and can be automatically exploited by higher level services (human and machine readability).
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

PubMed

Liolios, Konstantinos; Chen, I-Min A; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M; Kyrpides, Nikos C

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

PubMed Central

Liolios, Konstantinos; Chen, I-Min A.; Mavromatis, Konstantinos; Tavernarakis, Nektarios; Hugenholtz, Philip; Markowitz, Victor M.; Kyrpides, Nikos C.

2010-01-01

The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/ PMID:19914934
ODISEES: A New Paradigm in Data Access

NASA Astrophysics Data System (ADS)

Huffer, E.; Little, M. M.; Kusterer, J.

2013-12-01

As part of its ongoing efforts to improve access to data, the Atmospheric Science Data Center has developed a high-precision Earth Science domain ontology (the 'ES Ontology') implemented in a graph database ('the Semantic Metadata Repository') that is used to store detailed, semantically-enhanced, parameter-level metadata for ASDC data products. The ES Ontology provides the semantic infrastructure needed to drive the ASDC's Ontology-Driven Interactive Search Environment for Earth Science ('ODISEES'), a data discovery and access tool, and will support additional data services such as analytics and visualization. The ES ontology is designed on the premise that naming conventions alone are not adequate to provide the information needed by prospective data consumers to assess the suitability of a given dataset for their research requirements; nor are current metadata conventions adequate to support seamless machine-to-machine interactions between file servers and end-user applications. Data consumers need information not only about what two data elements have in common, but also about how they are different. End-user applications need consistent, detailed metadata to support real-time data interoperability. The ES ontology is a highly precise, bottom-up, queriable model of the Earth Science domain that focuses on critical details about the measurable phenomena, instrument techniques, data processing methods, and data file structures. Earth Science parameters are described in detail in the ES Ontology and mapped to the corresponding variables that occur in ASDC datasets. Variables are in turn mapped to well-annotated representations of the datasets that they occur in, the instrument(s) used to create them, the instrument platforms, the processing methods, etc., creating a linked-data structure that allows both human and machine users to access a wealth of information critical to understanding and manipulating the data. The mappings are recorded in the Semantic Metadata Repository as RDF-triples. An off-the-shelf Ontology Development Environment and a custom Metadata Conversion Tool comprise a human-machine/machine-machine hybrid tool that partially automates the creation of metadata as RDF-triples by interfacing with existing metadata repositories and providing a user interface that solicits input from a human user, when needed. RDF-triples are pushed to the Ontology Development Environment, where a reasoning engine executes a series of inference rules whose antecedent conditions can be satisfied by the initial set of RDF-triples, thereby generating the additional detailed metadata that is missing in existing repositories. A SPARQL Endpoint, a web-based query service and a Graphical User Interface allow prospective data consumers - even those with no familiarity with NASA data products - to search the metadata repository to find and order data products that meet their exact specifications. A web-based API will provide an interface for machine-to-machine transactions.
Data Discovery of Big and Diverse Climate Change Datasets - Options, Practices and Challenges

NASA Astrophysics Data System (ADS)

Palanisamy, G.; Boden, T.; McCord, R. A.; Frame, M. T.

2013-12-01

Developing data search tools is a very common, but often confusing, task for most of the data intensive scientific projects. These search interfaces need to be continually improved to handle the ever increasing diversity and volume of data collections. There are many aspects which determine the type of search tool a project needs to provide to their user community. These include: number of datasets, amount and consistency of discovery metadata, ancillary information such as availability of quality information and provenance, and availability of similar datasets from other distributed sources. Environmental Data Science and Systems (EDSS) group within the Environmental Science Division at the Oak Ridge National Laboratory has a long history of successfully managing diverse and big observational datasets for various scientific programs via various data centers such as DOE's Atmospheric Radiation Measurement Program (ARM), DOE's Carbon Dioxide Information and Analysis Center (CDIAC), USGS's Core Science Analytics and Synthesis (CSAS) metadata Clearinghouse and NASA's Distributed Active Archive Center (ORNL DAAC). This talk will showcase some of the recent developments for improving the data discovery within these centers The DOE ARM program recently developed a data discovery tool which allows users to search and discover over 4000 observational datasets. These datasets are key to the research efforts related to global climate change. The ARM discovery tool features many new functions such as filtered and faceted search logic, multi-pass data selection, filtering data based on data quality, graphical views of data quality and availability, direct access to data quality reports, and data plots. The ARM Archive also provides discovery metadata to other broader metadata clearinghouses such as ESGF, IASOA, and GOS. In addition to the new interface, ARM is also currently working on providing DOI metadata records to publishers such as Thomson Reuters and Elsevier. The ARM program also provides a standards based online metadata editor (OME) for PIs to submit their data to the ARM Data Archive. USGS CSAS metadata Clearinghouse aggregates metadata records from several USGS projects and other partner organizations. The Clearinghouse allows users to search and discover over 100,000 biological and ecological datasets from a single web portal. The Clearinghouse also enabled some new data discovery functions such as enhanced geo-spatial searches based on land and ocean classifications, metadata completeness rankings, data linkage via digital object identifiers (DOIs), and semantically enhanced keyword searches. The Clearinghouse also currently working on enabling a dashboard which allows the data providers to look at various statistics such as number their records accessed via the Clearinghouse, most popular keywords, metadata quality report and DOI creation service. The Clearinghouse also publishes metadata records to broader portals such as NSF DataONE and Data.gov. The author will also present how these capabilities are currently reused by the recent and upcoming data centers such as DOE's NGEE-Arctic project. References: [1] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94. [2]Devarakonda, R., Shrestha, B., Palanisamy, G., Hook, L., Killeffer, T., Krassovski, M., ... & Frame, M. (2014, October). OME: Tool for generating and managing metadata to handle BigData. In BigData Conference (pp. 8-10).
A standard for measuring metadata quality in spectral libraries

NASA Astrophysics Data System (ADS)

Rasaiah, B.; Jones, S. D.; Bellman, C.

2013-12-01

A standard for measuring metadata quality in spectral libraries Barbara Rasaiah, Simon Jones, Chris Bellman RMIT University Melbourne, Australia barbara.rasaiah@rmit.edu.au, simon.jones@rmit.edu.au, chris.bellman@rmit.edu.au ABSTRACT There is an urgent need within the international remote sensing community to establish a metadata standard for field spectroscopy that ensures high quality, interoperable metadata sets that can be archived and shared efficiently within Earth observation data sharing systems. Metadata are an important component in the cataloguing and analysis of in situ spectroscopy datasets because of their central role in identifying and quantifying the quality and reliability of spectral data and the products derived from them. This paper presents approaches to measuring metadata completeness and quality in spectral libraries to determine reliability, interoperability, and re-useability of a dataset. Explored are quality parameters that meet the unique requirements of in situ spectroscopy datasets, across many campaigns. Examined are the challenges presented by ensuring that data creators, owners, and data users ensure a high level of data integrity throughout the lifecycle of a dataset. Issues such as field measurement methods, instrument calibration, and data representativeness are investigated. The proposed metadata standard incorporates expert recommendations that include metadata protocols critical to all campaigns, and those that are restricted to campaigns for specific target measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. Approaches towards an operational and logistically viable implementation of a quality standard are discussed. This paper also proposes a way forward for adapting and enhancing current geospatial metadata standards to the unique requirements of field spectroscopy metadata quality. [0430] BIOGEOSCIENCES / Computational methods and data processing [0480] BIOGEOSCIENCES / Remote sensing [1904] INFORMATICS / Community standards [1912] INFORMATICS / Data management, preservation, rescue [1926] INFORMATICS / Geospatial [1930] INFORMATICS / Data and information governance [1946] INFORMATICS / Metadata [1952] INFORMATICS / Modeling [1976] INFORMATICS / Software tools and services [9810] GENERAL OR MISCELLANEOUS / New fields
[Radiological dose and metadata management].

PubMed

Walz, M; Kolodziej, M; Madsack, B

2016-12-01

This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented.

A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records

PubMed Central

Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

2016-01-01

Objective The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. Materials and Methods We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. Results We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Discussion Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Conclusion Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. PMID:26911818
A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records.

PubMed

Tahsin, Tasnia; Weissenbacher, Davy; Rivera, Robert; Beard, Rachel; Firago, Mari; Wallstrom, Garrick; Scotch, Matthew; Gonzalez, Graciela

2016-09-01

The metadata reflecting the location of the infected host (LOIH) of virus sequences in GenBank often lacks specificity. This work seeks to enhance this metadata by extracting more specific geographic information from related full-text articles and mapping them to their latitude/longitudes using knowledge derived from external geographical databases. We developed a rule-based information extraction framework for linking GenBank records to the latitude/longitudes of the LOIH. Our system first extracts existing geospatial metadata from GenBank records and attempts to improve it by seeking additional, relevant geographic information from text and tables in related full-text PubMed Central articles. The final extracted locations of the records, based on data assimilated from these sources, are then disambiguated and mapped to their respective geo-coordinates. We evaluated our approach on a manually annotated dataset comprising of 5728 GenBank records for the influenza A virus. We found the precision, recall, and f-measure of our system for linking GenBank records to the latitude/longitudes of their LOIH to be 0.832, 0.967, and 0.894, respectively. Our system had a high level of accuracy for linking GenBank records to the geo-coordinates of the LOIH. However, it can be further improved by expanding our database of geospatial data, incorporating spell correction, and enhancing the rules used for extraction. Our system performs reasonably well for linking GenBank records for the influenza A virus to the geo-coordinates of their LOIH based on record metadata and information extracted from related full-text articles. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
The Planetary Data System Information Model for Geometry Metadata

NASA Astrophysics Data System (ADS)

Guinness, E. A.; Gordon, M. K.

2014-12-01

The NASA Planetary Data System (PDS) has recently developed a new set of archiving standards based on a rigorously defined information model. An important part of the new PDS information model is the model for geometry metadata, which includes, for example, attributes of the lighting and viewing angles of observations, position and velocity vectors of a spacecraft relative to Sun and observing body at the time of observation and the location and orientation of an observation on the target. The PDS geometry model is based on requirements gathered from the planetary research community, data producers, and software engineers who build search tools. A key requirement for the model is that it fully supports the breadth of PDS archives that include a wide range of data types from missions and instruments observing many types of solar system bodies such as planets, ring systems, and smaller bodies (moons, comets, and asteroids). Thus, important design aspects of the geometry model are that it standardizes the definition of the geometry attributes and provides consistency of geometry metadata across planetary science disciplines. The model specification also includes parameters so that the context of values can be unambiguously interpreted. For example, the reference frame used for specifying geographic locations on a planetary body is explicitly included with the other geometry metadata parameters. The structure and content of the new PDS geometry model is designed to enable both science analysis and efficient development of search tools. The geometry model is implemented in XML, as is the main PDS information model, and uses XML schema for validation. The initial version of the geometry model is focused on geometry for remote sensing observations conducted by flyby and orbiting spacecraft. Future releases of the PDS geometry model will be expanded to include metadata for landed and rover spacecraft.
Omics Metadata Management Software v. 1 (OMMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
Global food and fibre security threatened by current inefficiencies in fungal identification.

PubMed

Crous, Pedro W; Groenewald, Johannes Z; Slippers, Bernard; Wingfield, Michael J

2016-12-05

Fungal pathogens severely impact global food and fibre crop security. Fungal species that cause plant diseases have mostly been recognized based on their morphology. In general, morphological descriptions remain disconnected from crucially important knowledge such as mating types, host specificity, life cycle stages and population structures. The majority of current fungal species descriptions lack even the most basic genetic data that could address at least some of these issues. Such information is essential for accurate fungal identifications, to link critical metadata and to understand the real and potential impact of fungal pathogens on production and natural ecosystems. Because international trade in plant products and introduction of pathogens to new areas is likely to continue, the manner in which fungal pathogens are identified should urgently be reconsidered. The technologies that would provide appropriate information for biosecurity and quarantine already exist, yet the scientific community and the regulatory authorities are slow to embrace them. International agreements are urgently needed to enforce new guidelines for describing plant pathogenic fungi (including key DNA information), to ensure availability of relevant data and to modernize the phytosanitary systems that must deal with the risks relating to trade-associated plant pathogens.This article is part of the themed issue 'Tackling emerging fungal threats to animal health, food security and ecosystem resilience'. © 2016 The Author(s).
Border Lakes land-cover classification

Treesearch

Marvin Bauer; Brian Loeffelholz; Doug Shinneman

2009-01-01

This document contains metadata and description of land-cover classification of approximately 5.1 million acres of land bordering Minnesota, U.S.A. and Ontario, Canada. The classification focused on the separation and identification of specific forest-cover types. Some separation of the nonforest classes also was performed. The classification was derived from multi-...
Metadata requirements for results of diagnostic imaging procedures: a BIIF profile to support user applications

NASA Astrophysics Data System (ADS)

Brown, Nicholas J.; Lloyd, David S.; Reynolds, Melvin I.; Plummer, David L.

2002-05-01

A visible digital image is rendered from a set of digital image data. Medical digital image data can be stored as either: (a) pre-rendered format, corresponding to a photographic print, or (b) un-rendered format, corresponding to a photographic negative. The appropriate image data storage format and associated header data (metadata) required by a user of the results of a diagnostic procedure recorded electronically depends on the task(s) to be performed. The DICOM standard provides a rich set of metadata that supports the needs of complex applications. Many end user applications, such as simple report text viewing and display of a selected image, are not so demanding and generic image formats such as JPEG are sometimes used. However, these are lacking some basic identification requirements. In this paper we make specific proposals for minimal extensions to generic image metadata of value in various domains, which enable safe use in the case of two simple healthcare end user scenarios: (a) viewing of text and a selected JPEG image activated by a hyperlink and (b) viewing of one or more JPEG images together with superimposed text and graphics annotation using a file specified by a profile of the ISO/IEC Basic Image Interchange Format (BIIF).
Context based configuration management system

NASA Technical Reports Server (NTRS)

Gurram, Mohana M. (Inventor); Maluf, David A. (Inventor); Mederos, Luis A. (Inventor); Gawdiak, Yuri O. (Inventor)

2010-01-01

A computer-based system for configuring and displaying information on changes in, and present status of, a collection of events associated with a project. Classes of icons for decision events, configurations and feedback mechanisms, and time lines (sequential and/or simultaneous) for related events are displayed. Metadata for each icon in each class is displayed by choosing and activating the corresponding icon. Access control (viewing, reading, writing, editing, deleting, etc.) is optionally imposed for metadata and other displayed information.
Data Bookkeeping Service 3 - Providing Event Metadata in CMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Giffels, Manuel; Guo, Y.; Riley, Daniel

The Data Bookkeeping Service 3 provides a catalog of event metadata for Monte Carlo and recorded data of the Compact Muon Solenoid (CMS) experiment at the Large Hadron Collider (LHC) at CERN, Geneva. It comprises all necessary information for tracking datasets, their processing history and associations between runs, files and datasets, on a large scale of about 200, 000 datasets and more than 40 million files, which adds up in around 700 GB of metadata. The DBS is an essential part of the CMS Data Management and Workload Management (DMWM) systems [1], all kind of data-processing like Monte Carlo production,more » processing of recorded event data as well as physics analysis done by the users are heavily relying on the information stored in DBS.« less
GeoViQua: quality-aware geospatial data discovery and evaluation

NASA Astrophysics Data System (ADS)

Bigagli, L.; Papeschi, F.; Mazzetti, P.; Nativi, S.

2012-04-01

GeoViQua (QUAlity aware VIsualization for the Global Earth Observation System of Systems) is a recently started FP7 project aiming at complementing the Global Earth Observation System of Systems (GEOSS) with rigorous data quality specifications and quality-aware capabilities, in order to improve reliability in scientific studies and policy decision-making. GeoViQua main scientific and technical objective is to enhance the GEOSS Common Infrastructure (GCI) providing the user community with innovative quality-aware search and evaluation tools, which will be integrated in the GEO-Portal, as well as made available to other end-user interfaces. To this end, GeoViQua will promote the extension of the current standard metadata for geographic information with accurate and expressive quality indicators, also contributing to the definition of a quality label (GEOLabel). GeoViQua proposed solutions will be assessed in several pilot case studies covering the whole Earth Observation chain, from remote sensing acquisition to data processing, to applications in the main GEOSS Societal Benefit Areas. This work presents the preliminary results of GeoViQua Work Package 4 "Enhanced geo-search tools" (WP4), started in January 2012. Its major anticipated technical innovations are search and evaluation tools that communicate and exploit data quality information from the GCI. In particular, GeoViQua will investigate a graphical search interface featuring a coherent and meaningful aggregation of statistics and metadata summaries (e.g. in the form of tables, charts), thus enabling end users to leverage quality constraints for data discovery and evaluation. Preparatory work on WP4 requirements indicated that users need the "best" data for their purpose, implying a high degree of subjectivity in judgment. This suggests that the GeoViQua system should exploit a combination of provider-generated metadata (objective indicators such as summary statistics), system-generated metadata (contextual/tracking information such as provenance of data and metadata), and user-generated metadata (informal user comments, usage information, rating, etc.). Moreover, metadata should include sufficiently complete access information, to allow rich data visualization and propagation. The following main enabling components are currently identified within WP4: - Quality-aware access services, e.g. a quality-aware extension of the OGC Sensor Observation Service (SOS-Q) specification, to support quality constraints for sensor data publishing and access; - Quality-aware discovery services, namely a quality-aware extension of the OGC Catalog Service for the Web (CSW-Q), to cope with quality constrained search; - Quality-augmentation broker (GeoViQua Broker), to support the linking and combination of the existing GCI metadata with GeoViQua- and user-generated metadata required to support the users in selecting the "best" data for their intended use. We are currently developing prototypes of the above quality-enabled geo-search components, that will be assessed in a sensor-based pilot case study in the next months. In particular, the GeoViQua Broker will be integrated with the EuroGEOSS Broker, to implement CSW-Q and federate (either via distribution or harvesting schemes) quality-aware data sources, GeoViQua will constitute a valuable test-bed for advancing the current best practices and standards in geospatial quality representation and exploitation. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 265178.
A SensorML-based Metadata Model and Registry for Ocean Observatories: a Contribution from European Projects NeXOS and FixO3

NASA Astrophysics Data System (ADS)

Delory, E.; Jirka, S.

2016-02-01

Discovering sensors and observation data is important when enabling the exchange of oceanographic data between observatories and scientists that need the data sets for their work. To better support this discovery process, one task of the European project FixO3 (Fixed-point Open Ocean Observatories) is dealing with the question which elements are needed for developing a better registry for sensors. This has resulted in four items which are addressed by the FixO3 project in cooperation with further European projects such as NeXOS (http://www.nexosproject.eu/). 1.) Metadata description format: To store and retrieve information about sensors and platforms it is necessary to have a common approach how to provide and encode the metadata. For this purpose, the OGC Sensor Model Language (SensorML) 2.0 standard was selected. Especially the opportunity to distinguish between sensor types and instances offers new chances for a more efficient provision and maintenance of sensor metadata. 2.) Conversion of existing metadata into a SensorML 2.0 representation: In order to ensure a sustainable re-use of already provided metadata content (e.g. from ESONET-FixO3 yellow pages), it is important to provide a mechanism which is capable of transforming these already available metadata sets into the new SensorML 2.0 structure. 3.) Metadata editor: To create descriptions of sensors and platforms, it is not possible to expect users to manually edit XML-based description files. Thus, a visual interface is necessary to help during the metadata creation. We will outline a prototype of this editor, building upon the development of the ESONET sensor registry interface. 4.) Sensor Metadata Store: A server is needed that for storing and querying the created sensor descriptions. For this purpose different options exist which will be discussed. In summary, we will present a set of different elements enabling sensor discovery ranging from metadata formats, metadata conversion and editing to metadata storage. Furthermore, the current development status will be demonstrated.
HDF-EOS Dump Tools

NASA Astrophysics Data System (ADS)

Prasad, U.; Rahabi, A.

2001-05-01

The following utilities developed for HDF-EOS format data dump are of special use for Earth science data for NASA's Earth Observation System (EOS). This poster demonstrates their use and application. The first four tools take HDF-EOS data files as input. HDF-EOS Metadata Dumper - metadmp Metadata dumper extracts metadata from EOS data granules. It operates by simply copying blocks of metadata from the file to the standard output. It does not process the metadata in any way. Since all metadata in EOS granules is encoded in the Object Description Language (ODL), the output of metadmp will be in the form of complete ODL statements. EOS data granules may contain up to three different sets of metadata (Core, Archive, and Structural Metadata). HDF-EOS Contents Dumper - heosls Heosls dumper displays the contents of HDF-EOS files. This utility provides detailed information on the POINT, SWATH, and GRID data sets. in the files. For example: it will list, the Geo-location fields, Data fields and objects. HDF-EOS ASCII Dumper - asciidmp The ASCII dump utility extracts fields from EOS data granules into plain ASCII text. The output from asciidmp should be easily human readable. With minor editing, asciidmp's output can be made ingestible by any application with ASCII import capabilities. HDF-EOS Binary Dumper - bindmp The binary dumper utility dumps HDF-EOS objects in binary format. This is useful for feeding the output of it into existing program, which does not understand HDF, for example: custom software and COTS products. HDF-EOS User Friendly Metadata - UFM The UFM utility tool is useful for viewing ECS metadata. UFM takes an EOSDIS ODL metadata file and produces an HTML report of the metadata for display using a web browser. HDF-EOS METCHECK - METCHECK METCHECK can be invoked from either Unix or Dos environment with a set of command line options that a user might use to direct the tool inputs and output . METCHECK validates the inventory metadata in (.met file) using The Descriptor file (.desc) as the reference. The tool takes (.desc), and (.met) an ODL file as inputs, and generates a simple output file contains the results of the checking process.
No Pixel Left Behind - Peeling Away NASA's Satellite Swaths

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Boller, R. A.; Schmaltz, J. E.; Roberts, J. T.; Alarcon, C.; Huang, T.; McGann, M.; Murphy, K. J.

2014-12-01

Discovery and identification of Earth Science products should not be the majority effort of scientific research. Search aides based on text metadata go to great lengths to simplify this process. However, the process is still cumbersome and requires too much data download and analysis to down select to valid products. The EOSDIS Global Imagery Browse Services (GIBS) is attempting to improve this process by providing "visual metadata" in the form of full-resolution visualizations representing geophysical parameters taken directly fromt he data. Through the use of accompanying interpretive information such as color legends and the natural visual processing of the human eye, researchers are able to search and filter through data products in a more natural and efficient way. The GIBS "visual metadata" products are generated as representations of Level 3 data or as temporal composites of the Level 2 granule- or swath-based data products projected across a geographic or polar region. Such an approach allows for low-latency tiled access to pre-generated imagery products. For many GIBS users, the resulting image suffices for a basic representation of the underlying data. However, composite imagery presents an insurmountable problem: for areas of spatial overlap within the composite, only one observation is visually represented. This is especially problematic in the polar regions where a significant portion of sensed data is "lost." In response to its user community, the GIBS team coordinated with its stakeholders to begin developing an approach to ensure that there is "no pixel left behind." In this presentation we will discuss the use cases and requirements guiding our efforts, considerations regarding standards compliance and interoperability, and near term goals. We will also discuss opportunities to actively engage with the GIBS team on this topic to continually improve our services.
System for Earth Sample Registration SESAR: Services for IGSN Registration and Sample Metadata Management

NASA Astrophysics Data System (ADS)

Chan, S.; Lehnert, K. A.; Coleman, R. J.

2011-12-01

SESAR, the System for Earth Sample Registration, is an online registry for physical samples collected for Earth and environmental studies. SESAR generates and administers the International Geo Sample Number IGSN, a unique identifier for samples that is dramatically advancing interoperability amongst information systems for sample-based data. SESAR was developed to provide the complete range of registry services, including definition of IGSN syntax and metadata profiles, registration and validation of name spaces requested by users, tools for users to submit and manage sample metadata, validation of submitted metadata, generation and validation of the unique identifiers, archiving of sample metadata, and public or private access to the sample metadata catalog. With the development of SESAR v3, we placed particular emphasis on creating enhanced tools that make metadata submission easier and more efficient for users, and that provide superior functionality for users to manage metadata of their samples in their private workspace MySESAR. For example, SESAR v3 includes a module where users can generate custom spreadsheet templates to enter metadata for their samples, then upload these templates online for sample registration. Once the content of the template is uploaded, it is displayed online in an editable grid format. Validation rules are executed in real-time on the grid data to ensure data integrity. Other new features of SESAR v3 include the capability to transfer ownership of samples to other SESAR users, the ability to upload and store images and other files in a sample metadata profile, and the tracking of changes to sample metadata profiles. In the next version of SESAR (v3.5), we will further improve the discovery, sharing, registration of samples. For example, we are developing a more comprehensive suite of web services that will allow discovery and registration access to SESAR from external systems. Both batch and individual registrations will be possible through web services. Based on valuable feedback from the user community, we will introduce enhancements that add greater flexibility to the system to accommodate the vast diversity of metadata that users want to store. Users will be able to create custom metadata fields and use these for the samples they register. Users will also be able to group samples into 'collections' to make retrieval for research projects or publications easier. An improved interface design will allow for better workflow transition and navigation throughout the application. In keeping up with the demands of a growing community, SESAR has also made process changes to ensure efficiency in system development. For example, we have implemented a release cycle to better track enhancements and fixes to the system, and an API library that facilitates reusability of code. Usage tracking, metrics and surveys capture information to guide the direction of future developments. A new set of administrative tools allows greater control of system management.
Publicizing Your Web Resources for Maximum Exposure.

ERIC Educational Resources Information Center

Smith, Kerry J.

2001-01-01

Offers advice to librarians for marketing their Web sites on Internet search engines. Advises against relying solely on spiders and recommends adding metadata to the source code and delivering that information directly to the search engines. Gives an overview of metadata and typical coding for meta tags. Includes Web addresses for a number of…
Emerging Network Storage Management Standards for Intelligent Data Storage Subsystems

NASA Technical Reports Server (NTRS)

Podio, Fernando; Vollrath, William; Williams, Joel; Kobler, Ben; Crouse, Don

1998-01-01

This paper discusses the need for intelligent storage devices and subsystems that can provide data integrity metadata, the content of the existing data integrity standard for optical disks and techniques and metadata to verify stored data on optical tapes developed by the Association for Information and Image Management (AIIM) Optical Tape Committee.
Now That We've Found the "Hidden Web," What Can We Do with It?

ERIC Educational Resources Information Center

Cole, Timothy W.; Kaczmarek, Joanne; Marty, Paul F.; Prom, Christopher J.; Sandore, Beth; Shreeves, Sarah

The Open Archives Initiative (OAI) Protocol for Metadata Harvesting (PMH) is designed to facilitate discovery of the "hidden web" of scholarly information, such as that contained in databases, finding aids, and XML documents. OAI-PMH supports standardized exchange of metadata describing items in disparate collections, of such as those…
Compatibility Between Metadata Standards: Import Pipeline of CDISC ODM to the Samply.MDR.

PubMed

Kock-Schoppenhauer, Ann-Kristin; Ulrich, Hannes; Wagen-Zink, Stefanie; Duhm-Harbeck, Petra; Ingenerf, Josef; Neuhaus, Philipp; Dugas, Martin; Bruland, Philipp

2018-01-01

The establishment of a digital healthcare system is a national and community task. The Federal Ministry of Education and Research in Germany is providing funding for consortia consisting of university hospitals among others participating in the "Medical Informatics Initiative". Exchange of medical data between research institutions necessitates a place where meta information for this data is made accessible. Within these consortia different metadata registry solutions were chosen. To promote interoperability between these solutions, we have examined whether the portal of Medical Data Models is eligible for managing and communicating metadata and relevant information across different data integration centres of the Medical Informatics Initiative and beyond. Apart from the MDM-portal, some ISO 11179-based systems such as Samply.MDR as well as openEHR-based solutions are going to be applyed. In this paper, we have focused on the creation of a mapping model between the CDISC ODM standard and the Samply.MDR import format. In summary, it can be stated that the mapping model is feasible and promote the exchangeability between different metadata registry approaches.
The importance of metadata to assess information content in digital reconstructions of neuronal morphology.

PubMed

Parekh, Ruchi; Armañanzas, Rubén; Ascoli, Giorgio A

2015-04-01

Digital reconstructions of axonal and dendritic arbors provide a powerful representation of neuronal morphology in formats amenable to quantitative analysis, computational modeling, and data mining. Reconstructed files, however, require adequate metadata to identify the appropriate animal species, developmental stage, brain region, and neuron type. Moreover, experimental details about tissue processing, neurite visualization and microscopic imaging are essential to assess the information content of digital morphologies. Typical morphological reconstructions only partially capture the underlying biological reality. Tracings are often limited to certain domains (e.g., dendrites and not axons), may be incomplete due to tissue sectioning, imperfect staining, and limited imaging resolution, or can disregard aspects irrelevant to their specific scientific focus (such as branch thickness or depth). Gauging these factors is critical in subsequent data reuse and comparison. NeuroMorpho.Org is a central repository of reconstructions from many laboratories and experimental conditions. Here, we introduce substantial additions to the existing metadata annotation aimed to describe the completeness of the reconstructed neurons in NeuroMorpho.Org. These expanded metadata form a suitable basis for effective description of neuromorphological data.
Content standards for medical image metadata

NASA Astrophysics Data System (ADS)

d'Ornellas, Marcos C.; da Rocha, Rafael P.

2003-12-01

Medical images are at the heart of the healthcare diagnostic procedures. They have provided not only a noninvasive mean to view anatomical cross-sections of internal organs but also a mean for physicians to evaluate the patient"s diagnosis and monitor the effects of the treatment. For a Medical Center, the emphasis may shift from the generation of image to post processing and data management since the medical staff may generate even more processed images and other data from the original image after various analyses and post processing. A medical image data repository for health care information system is becoming a critical need. This data repository would contain comprehensive patient records, including information such as clinical data and related diagnostic images, and post-processed images. Due to the large volume and complexity of the data as well as the diversified user access requirements, the implementation of the medical image archive system will be a complex and challenging task. This paper discusses content standards for medical image metadata. In addition it also focuses on the image metadata content evaluation and metadata quality management.

Federal Data Repository Research: Recent Developments in Mercury Search System Architecture

NASA Astrophysics Data System (ADS)

Devarakonda, R.

2015-12-01

New data intensive project initiatives needs new generation data system architecture. This presentation will discuss the recent developments in Mercury System [1] including adoption, challenges, and future efforts to handle such data intensive projects. Mercury is a combination of three main tools (i) Data/Metadata registration Tool (Online Metadata Editor): The new Online Metadata Editor (OME) is a web-based tool to help document the scientific data in a well-structured, popular scientific metadata formats. (ii) Search and Visualization Tool: Provides a single portal to information contained in disparate data management systems. It facilitates distributed metadata management, data discovery, and various visuzalization capabilities. (iii) Data Citation Tool: In collaboration with Department of Energy's Oak Ridge National Laboratory (ORNL) Mercury Consortium (funded by NASA, USGS and DOE), established a Digital Object Identifier (DOI) service. Mercury is a open source system, developed and managed at Oak Ridge National Laboratory and is currently being funded by three federal agencies, including NASA, USGS and DOE. It provides access to millions of bio-geo-chemical and ecological data; 30,000 scientists use it each month. Some recent data intensive projects that are using Mercury tool: USGS Science Data Catalog (http://data.usgs.gov/), Next-Generation Ecosystem Experiments (http://ngee-arctic.ornl.gov/), Carbon Dioxide Information Analysis Center (http://cdiac.ornl.gov/), Oak Ridge National Laboratory - Distributed Active Archive Center (http://daac.ornl.gov), SoilSCAPE (http://mercury.ornl.gov/soilscape). References: [1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94.
Inter-University Upper Atmosphere Global Observation Network (IUGONET) Metadata Database and Its Interoperability

NASA Astrophysics Data System (ADS)

Yatagai, A. I.; Iyemori, T.; Ritschel, B.; Koyama, Y.; Hori, T.; Abe, S.; Tanaka, Y.; Shinbori, A.; Umemura, N.; Sato, Y.; Yagi, M.; Ueno, S.; Hashiguchi, N. O.; Kaneda, N.; Belehaki, A.; Hapgood, M. A.

2013-12-01

The IUGONET is a Japanese program to build a metadata database for ground-based observations of the upper atmosphere [1]. The project began in 2009 with five Japanese institutions which archive data observed by radars, magnetometers, photometers, radio telescopes and helioscopes, and so on, at various altitudes from the Earth's surface to the Sun. Systems have been developed to allow searching of the above described metadata. We have been updating the system and adding new and updated metadata. The IUGONET development team adopted the SPASE metadata model [2] to describe the upper atmosphere data. This model is used as the common metadata format by the virtual observatories for solar-terrestrial physics. It includes metadata referring to each data file (called a 'Granule'), which enable a search for data files as well as data sets. Further details are described in [2] and [3]. Currently, three additional Japanese institutions are being incorporated in IUGONET. Furthermore, metadata of observations of the troposphere, taken at the observatories of the middle and upper atmosphere radar at Shigaraki and the Meteor radar in Indonesia, have been incorporated. These additions will contribute to efficient interdisciplinary scientific research. In the beginning of 2013, the registration of the 'Observatory' and 'Instrument' metadata was completed, which makes it easy to overview of the metadata database. The number of registered metadata as of the end of July, totalled 8.8 million, including 793 observatories and 878 instruments. It is important to promote interoperability and/or metadata exchange between the database development groups. A memorandum of agreement has been signed with the European Near-Earth Space Data Infrastructure for e-Science (ESPAS) project, which has similar objectives to IUGONET with regard to a framework for formal collaboration. Furthermore, observations by satellites and the International Space Station are being incorporated with a view for making/linking metadata databases. The development of effective data systems will contribute to the progress of scientific research on solar terrestrial physics, climate and the geophysical environment. Any kind of cooperation, metadata input and feedback, especially for linkage of the databases, is welcomed. References 1. Hayashi, H. et al., Inter-university Upper Atmosphere Global Observation Network (IUGONET), Data Sci. J., 12, WDS179-184, 2013. 2. King, T. et al., SPASE 2.0: A standard data model for space physics. Earth Sci. Inform. 3, 67-73, 2010, doi:10.1007/s12145-010-0053-4. 3. Hori, T., et al., Development of IUGONET metadata format and metadata management system. J. Space Sci. Info. Jpn., 105-111, 2012. (in Japanese)
The CMIP5 Model Documentation Questionnaire: Development of a Metadata Retrieval System for the METAFOR Common Information Model

NASA Astrophysics Data System (ADS)

Pascoe, Charlotte; Lawrence, Bryan; Moine, Marie-Pierre; Ford, Rupert; Devine, Gerry

2010-05-01

The EU METAFOR Project (http://metaforclimate.eu) has created a web-based model documentation questionnaire to collect metadata from the modelling groups that are running simulations in support of the Coupled Model Intercomparison Project - 5 (CMIP5). The CMIP5 model documentation questionnaire will retrieve information about the details of the models used, how the simulations were carried out, how the simulations conformed to the CMIP5 experiment requirements and details of the hardware used to perform the simulations. The metadata collected by the CMIP5 questionnaire will allow CMIP5 data to be compared in a scientifically meaningful way. This paper describes the life-cycle of the CMIP5 questionnaire development which starts with relatively unstructured input from domain specialists and ends with formal XML documents that comply with the METAFOR Common Information Model (CIM). Each development step is associated with a specific tool. (1) Mind maps are used to capture information requirements from domain experts and build a controlled vocabulary, (2) a python parser processes the XML files generated by the mind maps, (3) Django (python) is used to generate the dynamic structure and content of the web based questionnaire from processed xml and the METAFOR CIM, (4) Python parsers ensure that information entered into the CMIP5 questionnaire is output as CIM compliant xml, (5) CIM compliant output allows automatic information capture tools to harvest questionnaire content into databases such as the Earth System Grid (ESG) metadata catalogue. This paper will focus on how Django (python) and XML input files are used to generate the structure and content of the CMIP5 questionnaire. It will also address how the choice of development tools listed above provided a framework that enabled working scientists (who we would never ordinarily get to interact with UML and XML) to be part the iterative development process and ensure that the CMIP5 model documentation questionnaire reflects what scientists want to know about the models. Keywords: metadata, CMIP5, automatic information capture, tool development
Antimicrobial resistance prediction in PATRIC and RAST

DOE Office of Scientific and Technical Information (OSTI.GOV)

Davis, James J.; Boisvert, Sebastien; Brettin, Thomas

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned bymore » their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88–99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71–88%. Lastly, this set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.« less
Antimicrobial resistance prediction in PATRIC and RAST

DOE PAGES

Davis, James J.; Boisvert, Sebastien; Brettin, Thomas; ...

2016-06-14

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned bymore » their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88–99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71–88%. Lastly, this set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.« less
PanMetaDocs - A tool for collecting and managing the long tail of "small science data"

NASA Astrophysics Data System (ADS)

Klump, J.; Ulbricht, D.

2011-12-01

In the early days of thinking about cyberinfrastructure the focus was on "big science data". Today, the challenge is not anymore to store several terabytes of data, but to manage data objects in a way that facilitates their re-use. Key to re-use by a user as a data consumer is proper documentation of the data. Also, data consumers need discovery metadata to find the data they need and they need descriptive metadata to be able to use the data they retrieved. Thus, data documentation faces the challenge to extensively and completely describe these objects, hold the items easily accessible at a sustainable cost level. However, data curation and documentation do not rank high in the everyday work of a scientist as a data producer. Data producers are often frustrated by being asked to provide metadata on their data over and over again, information that seemed very obvious from the context of their work. A challenge to data archives is the wide variety of metadata schemata in use, which creates a number of maintenance and design challenges of its own. PanMetaDocs addresses these issues by allowing an uploaded files to be described by more than one metadata object. PanMetaDocs, which was developed from PanMetaWorks, is a PHP based web application that allow to describe data with any xml-based metadata schema. Its user interface is browser based and was developed to collect metadata and data in collaborative scientific projects situated at one or more institutions. The metadata fields can be filled with static or dynamic content to reduce the number of fields that require manual entries to a minimum and make use of contextual information in a project setting. In the development of PanMetaDocs the business logic of panMetaWorks is reused, except for the authentication and data management functions of PanMetaWorks, which are delegated to the eSciDoc framework. The eSciDoc repository framework is designed as a service oriented architecture that can be controlled through a REST interface to create version controlled items with metadata records in XML format. PanMetaDocs utilizes the eSciDoc items model to add multiple metadata records that describe uploaded files in different metadata schemata. While datasets are collected and described, shared to collaborate with other scientists and finally published, data objects are transferred from a shared data curation domain into a persistent data curation domain. Through an RSS interface for recent datasets PanMetaWorks allows project members to be informed about data uploaded by other project members. The implementation of the OAI-PMH interface can be used to syndicate data catalogs to research data portals, such as the panFMP data portal framework. Once data objects are uploaded to the eSciDoc infrastructure it is possible to drop the software instance that was used for collecting the data, while the compiled data and metadata are accessible for other authorized applications through the institution's eSciDoc middleware. This approach of "expendable data curation tools" allows for a significant reduction in costs for software maintenance as expensive data capture applications do not need to be maintained indefinitely to ensure long term access to the stored data.
Framework for Integrating Science Data Processing Algorithms Into Process Control Systems

NASA Technical Reports Server (NTRS)

Mattmann, Chris A.; Crichton, Daniel J.; Chang, Albert Y.; Foster, Brian M.; Freeborn, Dana J.; Woollard, David M.; Ramirez, Paul M.

2011-01-01

A software framework called PCS Task Wrapper is responsible for standardizing the setup, process initiation, execution, and file management tasks surrounding the execution of science data algorithms, which are referred to by NASA as Product Generation Executives (PGEs). PGEs codify a scientific algorithm, some step in the overall scientific process involved in a mission science workflow. The PCS Task Wrapper provides a stable operating environment to the underlying PGE during its execution lifecycle. If the PGE requires a file, or metadata regarding the file, the PCS Task Wrapper is responsible for delivering that information to the PGE in a manner that meets its requirements. If the PGE requires knowledge of upstream or downstream PGEs in a sequence of executions, that information is also made available. Finally, if information regarding disk space, or node information such as CPU availability, etc., is required, the PCS Task Wrapper provides this information to the underlying PGE. After this information is collected, the PGE is executed, and its output Product file and Metadata generation is managed via the PCS Task Wrapper framework. The innovation is responsible for marshalling output Products and Metadata back to a PCS File Management component for use in downstream data processing and pedigree. In support of this, the PCS Task Wrapper leverages the PCS Crawler Framework to ingest (during pipeline processing) the output Product files and Metadata produced by the PGE. The architectural components of the PCS Task Wrapper framework include PGE Task Instance, PGE Config File Builder, Config File Property Adder, Science PGE Config File Writer, and PCS Met file Writer. This innovative framework is really the unifying bridge between the execution of a step in the overall processing pipeline, and the available PCS component services as well as the information that they collectively manage.
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations

DOE PAGES

Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley; ...

2017-06-20

Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
Handling Metadata in a Neurophysiology Laboratory

PubMed Central

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G.; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework. PMID:27486397
Handling Metadata in a Neurophysiology Laboratory.

PubMed

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework.
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley

Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples.

PubMed

Deck, John; Gaither, Michelle R; Ewing, Rodney; Bird, Christopher E; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J; Crandall, Eric D

2017-08-01

The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.
Metadata management and semantics in microarray repositories.

PubMed

Kocabaş, F; Can, T; Baykal, N

2011-12-01

The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.
HTTP-based Search and Ordering Using ECHO's REST-based and OpenSearch APIs

NASA Astrophysics Data System (ADS)

Baynes, K.; Newman, D. J.; Pilone, D.

2012-12-01

Metadata is an important entity in the process of cataloging, discovering, and describing Earth science data. NASA's Earth Observing System (EOS) ClearingHOuse (ECHO) acts as the core metadata repository for EOSDIS data centers, providing a centralized mechanism for metadata and data discovery and retrieval. By supporting both the ESIP's Federated Search API and its own search and ordering interfaces, ECHO provides multiple capabilities that facilitate ease of discovery and access to its ever-increasing holdings. Users are able to search and export metadata in a variety of formats including ISO 19115, json, and ECHO10. This presentation aims to inform technically savvy clients interested in automating search and ordering of ECHO's metadata catalog. The audience will be introduced to practical and applicable examples of end-to-end workflows that demonstrate finding, sub-setting and ordering data that is bound by keyword, temporal and spatial constraints. Interaction with the ESIP OpenSearch Interface will be highlighted, as will ECHO's own REST-based API.
Assuring the Quality of Agricultural Learning Repositories: Issues for the Learning Object Metadata Creation Process of the CGIAR

NASA Astrophysics Data System (ADS)

Zschocke, Thomas; Beniest, Jan

The Consultative Group on International Agricultural Re- search (CGIAR) has established a digital repository to share its teaching and learning resources along with descriptive educational information based on the IEEE Learning Object Metadata (LOM) standard. As a critical component of any digital repository, quality metadata are critical not only to enable users to find more easily the resources they require, but also for the operation and interoperability of the repository itself. Studies show that repositories have difficulties in obtaining good quality metadata from their contributors, especially when this process involves many different stakeholders as is the case with the CGIAR as an international organization. To address this issue the CGIAR began investigating the Open ECBCheck as well as the ISO/IEC 19796-1 standard to establish quality protocols for its training. The paper highlights the implications and challenges posed by strengthening the metadata creation workflow for disseminating learning objects of the CGIAR.
Metadata Exporter for Scientific Photography Management

NASA Astrophysics Data System (ADS)

Staudigel, D.; English, B.; Delaney, R.; Staudigel, H.; Koppers, A.; Hart, S.

2005-12-01

Photographs have become an increasingly important medium, especially with the advent of digital cameras. It has become inexpensive to take photographs and quickly post them on a website. However informative photos may be, they still need to be displayed in a convenient way, and be cataloged in such a manner that makes them easily locatable. Managing the great number of photographs that digital cameras allow and creating a format for efficient dissemination of the information related to the photos is a tedious task. Products such as Apple's iPhoto have greatly eased the task of managing photographs, However, they often have limitations. Un-customizable metadata fields and poor metadata extraction tools limit their scientific usefulness. A solution to this persistent problem is a customizable metadata exporter. On the ALIA expedition, we successfully managed the thousands of digital photos we took. We did this with iPhoto and a version of the exporter that is now available to the public under the name "CustomHTMLExport" (http://www.versiontracker.com/dyn/moreinfo/macosx/27777), currently undergoing formal beta testing This software allows the use of customized metadata fields (including description, time, date, GPS data, etc.), which is exported along with the photo. It can also produce webpages with this data straight from iPhoto, in a much more flexible way than is already allowed. With this tool it becomes very easy to manage and distribute scientific photos.
The Value of Data and Metadata Standardization for Interoperability in Giovanni

NASA Astrophysics Data System (ADS)

Smit, C.; Hegde, M.; Strub, R. F.; Bryant, K.; Li, A.; Petrenko, M.

2017-12-01

Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization. This poster gives examples of how our metadata and data standards, both external and internal, have both simplified our code base and improved our users' experiences.
EARS : Repositioning data management near data acquisition.

NASA Astrophysics Data System (ADS)

Sinquin, Jean-Marc; Sorribas, Jordi; Diviacco, Paolo; Vandenberghe, Thomas; Munoz, Raquel; Garcia, Oscar

2016-04-01

The EU FP7 Projects Eurofleets and Eurofleets2 are an European wide alliance of marine research centers that aim to share their research vessels, to improve information sharing on planned, current and completed cruises, on details of ocean-going research vessels and specialized equipment, and to durably improve cost-effectiveness of cruises. Within this context logging of information on how, when and where anything happens on board of the vessel is crucial information for data users in a later stage. This forms a primordial step in the process of data quality control as it could assist in the understanding of anomalies and unexpected trends recorded in the acquired data sets. In this way completeness of the metadata is improved as it is recorded accurately at the origin of the measurement. The collection of this crucial information has been done in very different ways, using different procedures, formats and pieces of software in the context of the European Research Fleet. At the time that the Eurofleets project started, every institution and country had adopted different strategies and approaches, which complicated the task of users that need to log general purpose information and events on-board whenever they access a different platform loosing the opportunity to produce this valuable metadata on-board. Among the many goals the Eurofleets project has, a very important task is the development of an "event log software" called EARS (Eurofleets Automatic Reporting System) that enables scientists and operators to record what happens during a survey. EARS will allow users to fill, in a standardized way, the gap existing at the moment in metadata description that only very seldom links data with its history. Events generated automatically by acquisition instruments will also be handled, enhancing the granularity and precision of the event annotation. The adoption of a common procedure to log survey events and a common terminology to describe them is crucial to provide a friendly and successfully metadata on-board creation procedure for the whole the European Fleet. The possibility of automatically reporting metadata and general purpose data, will simplify the work of scientists and data managers with regards to data transmission. An improved accuracy and completeness of metadata is expected when events are recorded at acquisition time. This will also enhance multiple usages of the data as it allows verification of the different requirements existing in different disciplines.
Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR.

PubMed

Ulrich, Hannes; Kock, Ann-Kristin; Duhm-Harbeck, Petra; Habermann, Jens K; Ingenerf, Josef

2016-01-01

Unreconciled data structures and formats are a common obstacle to the urgently required sharing and reuse of data within healthcare and medical research. Within the North German Tumor Bank of Colorectal Cancer, clinical and sample data, based on a harmonized data set, is collected and can be pooled by using a hospital-integrated Research Data Management System supporting biobank and study management. Adding further partners who are not using the core data set requires manual adaptations and mapping of data elements. Facing this manual intervention and focusing the reuse of heterogeneous healthcare instance data (value level) and data elements (metadata level), a metadata repository has been developed. The metadata repository is an ISO 11179-3 conformant server application built for annotating and mediating data elements. The implemented architecture includes the translation of metadata information about data elements into the FHIR standard using the FHIR Data Element resource with the ISO 11179 Data Element Extensions. The FHIR-based processing allows exchange of data elements with clinical and research IT systems as well as with other metadata systems. With increasingly annotated and harmonized data elements, data quality and integration can be improved for successfully enabling data analytics and decision support.
Identity and privacy. Unique in the shopping mall: on the reidentifiability of credit card metadata.

PubMed

de Montjoye, Yves-Alexandre; Radaelli, Laura; Singh, Vivek Kumar; Pentland, Alex Sandy

2015-01-30

Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. Understanding the privacy of these data sets is key to their broad use and, ultimately, their impact. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata. Copyright © 2015, American Association for the Advancement of Science.

Next-Generation Search Engines for Information Retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Hook, Leslie A; Palanisamy, Giri

In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is rich, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. Data should be stored in a format that can be retrievable and more importantly it should be in a format that will continue to be accessible as technology changes, such as XML. While general-purpose search engines (such as Google or Bing) are useful formore » finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences. Mercury is open-source toolset, backend built on Java and search capability is supported by the some popular open source search libraries such as SOLR and LUCENE. Mercury harvests the structured metadata and key data from several data providing servers around the world and builds a centralized index. The harvested files are indexed against SOLR search API consistently, so that it can render search capabilities such as simple, fielded, spatial and temporal searches across a span of projects ranging from land, atmosphere, and ocean ecology. Mercury also provides data sharing capabilities using Open Archive Initiatives Protocol for Metadata Handling (OAI-PMH). In this paper we will discuss about the best practices for archiving data and metadata, new searching techniques, efficient ways of data retrieval and information display.« less
Control vocabulary software designed for CMIP6

NASA Astrophysics Data System (ADS)

Nadeau, D.; Taylor, K. E.; Williams, D. N.; Ames, S.

2016-12-01

The Coupled Model Intercomparison Project Phase 6 (CMIP6) coordinates a number of intercomparison activities and includes many more experiments than its predecessor, CMIP5. In order to organize and facilitate use of the complex collection of expected CMIP6 model output, a standard set of descriptive information has been defined, which must be stored along with the data. This standard information enables automated machine interpretation of the contents of all model output files. The standard metadata is stored in compliance with the Climate and Forecast (CF) standard, which ensures that it can be interpreted and visualized by many standard software packages. Additional attributes (not standardized by CF) are required by CMIP6 to enhance identification of models and experiments, and to provide additional information critical for interpreting the model results. To ensure that CMIP6 data complies with the standards, a python program called "PrePARE" (Pre-Publication Attribute Reviewer for the ESGF) has been developed to check the model output prior to its publication and release for analysis. If, for example, a required attribute is missing or incorrect (e.g., not included in the reference CMIP6 controlled vocabularies), then PrePare will prevent publication. In some circumstances, missing attributes can be created or incorrect attributes can be replaced automatically by PrePARE, and the program will warn users about the changes that have been made. PrePARE provides a final check on model output assuring adherence to a baseline conformity across the output from all CMIP6 models which will facilitate analysis by climate scientists. PrePARE is flexible and can be easily modified for use by similar projects that have a well-defined set of metadata and controlled vocabularies.
Ready to put metadata on the post-2015 development agenda? Linking data publications to responsible innovation and science diplomacy.

PubMed

Özdemir, Vural; Kolker, Eugene; Hotez, Peter J; Mohin, Sophie; Prainsack, Barbara; Wynne, Brian; Vayena, Effy; Coşkun, Yavuz; Dereli, Türkay; Huzair, Farah; Borda-Rodriguez, Alexander; Bragazzi, Nicola Luigi; Faris, Jack; Ramesar, Raj; Wonkam, Ambroise; Dandara, Collet; Nair, Bipin; Llerena, Adrián; Kılıç, Koray; Jain, Rekha; Reddy, Panga Jaipal; Gollapalli, Kishore; Srivastava, Sanjeeva; Kickbusch, Ilona

2014-01-01

Metadata refer to descriptions about data or as some put it, "data about data." Metadata capture what happens on the backstage of science, on the trajectory from study conception, design, funding, implementation, and analysis to reporting. Definitions of metadata vary, but they can include the context information surrounding the practice of science, or data generated as one uses a technology, including transactional information about the user. As the pursuit of knowledge broadens in the 21(st) century from traditional "science of whats" (data) to include "science of hows" (metadata), we analyze the ways in which metadata serve as a catalyst for responsible and open innovation, and by extension, science diplomacy. In 2015, the United Nations Millennium Development Goals (MDGs) will formally come to an end. Therefore, we propose that metadata, as an ingredient of responsible innovation, can help achieve the Sustainable Development Goals (SDGs) on the post-2015 agenda. Such responsible innovation, as a collective learning process, has become a key component, for example, of the European Union's 80 billion Euro Horizon 2020 R&D Program from 2014-2020. Looking ahead, OMICS: A Journal of Integrative Biology, is launching an initiative for a multi-omics metadata checklist that is flexible yet comprehensive, and will enable more complete utilization of single and multi-omics data sets through data harmonization and greater visibility and accessibility. The generation of metadata that shed light on how omics research is carried out, by whom and under what circumstances, will create an "intervention space" for integration of science with its socio-technical context. This will go a long way to addressing responsible innovation for a fairer and more transparent society. If we believe in science, then such reflexive qualities and commitments attained by availability of omics metadata are preconditions for a robust and socially attuned science, which can then remain broadly respected, independent, and responsibly innovative. "In Sierra Leone, we have not too much electricity. The lights will come on once in a week, and the rest of the month, dark[ness]. So I made my own battery to power light in people's houses." Kelvin Doe (Global Minimum, 2012) MIT Visiting Young Innovator Cambridge, USA, and Sierra Leone "An important function of the (Global) R&D Observatory will be to provide support and training to build capacity in the collection and analysis of R&D flows, and how to link them to the product pipeline." World Health Organization (2013) Draft Working Paper on a Global Health R&D Observatory.
Automatic Extraction of Metadata from Scientific Publications for CRIS Systems

ERIC Educational Resources Information Center

Kovacevic, Aleksandar; Ivanovic, Dragan; Milosavljevic, Branko; Konjovic, Zora; Surla, Dusan

2011-01-01

Purpose: The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS). Design/methodology/approach: The system is based on machine learning and performs automatic extraction…
A Meta-Relational Approach for the Definition and Management of Hybrid Learning Objects

ERIC Educational Resources Information Center

Navarro, Antonio; Fernandez-Pampillon, Ana Ma.; Fernandez-Chamizo, Carmen; Fernandez-Valmayor, Alfredo

2013-01-01

Electronic learning objects (LOs) are commonly conceived of as digital units of information used for teaching and learning. To facilitate their classification for pedagogical planning and retrieval purposes, LOs are complemented with metadata (e.g., the author). These metadata are usually restricted by a set of predetermined tags to which the…
The long-term ecological research community metada standardisation project: a progress report

Treesearch

Inigo San Gil; Karen Baker; John Campbell; Ellen G. Denny; Kristin Vanderbilt; Brian Riordan; Rebecca Koskela; Jason Downing; Sabine Grabner; Eda Melendez; Jonathan M. Walsh; Masib Kortz; James Conners; Lynn Yarmey; Nicole Kaplan; Emery R. Boose; Linda Powell; Corinna Gries; Robin Schroeder; Todd Ackerman; Ken Ramsey; Barbara Benson; Jonathan Chipman; James Laundre; Hap Garritt; Don Henshaw; Barrie Collins; Christopher Gardner; Sven Bohm; Margaret O' Brien; Jincheng Gao; Wade Sheldon; Stephanie Lyon; Dan Bahauddin; Mark Servilla; Duane Costa; James Brunt

2009-01-01

We describe the process by which the Long-Term Ecological Research (LTER) Network standardized their metadata through the adoption of the Ecological Metadata Language (EML). We describe the strategies developed to improve motivation and to complement the information technology resources available at the LTER sites. EML implementation is presented as a mapping process...
Academic Libraries and the Semantic Web: What the Future May Hold for Research-Supporting Library Catalogues

ERIC Educational Resources Information Center

Campbell, D. Grant; Fast, Karl V.

2004-01-01

This paper examines how future metadata capabilities could enable academic libraries to exploit information on the emerging Semantic Web in their library catalogues. Whereas current metadata architectures treat the Web as a simple means of interchanging bibliographic data that have been created by libraries, this paper suggests that academic…
Development of an oil spill information system combining remote sensing data and surveillance metadata

NASA Astrophysics Data System (ADS)

Tufte, Lars; Trieschmann, Olaf; Carreau, Philippe; Hunsaenger, Thomas; Clayton, Peter J. S.; Barjenbruch, Ulrich

2004-02-01

The detection of accidentally or illegal marine oil discharges in the German territorial waters of the North Sea and Baltic Sea is of great importance for combating of oil spills and protection of the marine ecosystem. Therefore the German Federal Ministry of Transport set up an airborne surveillance system consisting of two Dornier DO 228-212 aircrafts equipped with a Side-Looking Airborne Radar (SLAR), a IR/UV sensor, a Microwave Radiometer (MWR) for quantification and a Laser-Flurosensor (LFS) for classification purposes of the oil spills. The flight parameters and the remote sensing data are stored in a database during the flight. A Pollution Observation Log is completed by the operator consisting of information about the detected oil spill (e.g. position, length, width) and several other information about the flight (e.g. name of navigator, name of observer). The objective was to develop an oil spill information system which integrates the described data, metadata and includes visualization and spatial analysis capabilities. The metadata are essential for further statistical analysis in spatial and temporal domains of oil spill occurrences and of the surveillance itself. It should facilitate the communication and distribution of metadata between the administrative bodies and partners of the German oil spill surveillance system. A connection between a GIS and the database allows to use the powerful visualization and spatial analysis functionality of the GIS in conjunction with the oil spill database.
Asymmetric programming: a highly reliable metadata allocation strategy for MLC NAND flash memory-based sensor systems.

PubMed

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-10-10

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.
Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

PubMed Central

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-01-01

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme. PMID:25310473
Publishing NASA Metadata as Linked Open Data for Semantic Mashups

NASA Astrophysics Data System (ADS)

Wilson, Brian; Manipon, Gerald; Hua, Hook

2014-05-01

Data providers are now publishing more metadata in more interoperable forms, e.g. Atom or RSS 'casts', as Linked Open Data (LOD), or as ISO Metadata records. A major effort on the part of the NASA's Earth Science Data and Information System (ESDIS) project is the aggregation of metadata that enables greater data interoperability among scientific data sets regardless of source or application. Both the Earth Observing System (EOS) ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) repositories contain metadata records for NASA (and other) datasets and provided services. These records contain typical fields for each dataset (or software service) such as the source, creation date, cognizant institution, related access URL's, and domain and variable keywords to enable discovery. Under a NASA ACCESS grant, we demonstrated how to publish the ECHO and GCMD dataset and services metadata as LOD in the RDF format. Both sets of metadata are now queryable at SPARQL endpoints and available for integration into "semantic mashups" in the browser. It is straightforward to reformat sets of XML metadata, including ISO, into simple RDF and then later refine and improve the RDF predicates by reusing known namespaces such as Dublin core, georss, etc. All scientific metadata should be part of the LOD world. In addition, we developed an "instant" drill-down and browse interface that provides faceted navigation so that the user can discover and explore the 25,000 datasets and 3000 services. The available facets and the free-text search box appear in the left panel, and the instantly updated results for the dataset search appear in the right panel. The user can constrain the value of a metadata facet simply by clicking on a word (or phrase) in the "word cloud" of values for each facet. The display section for each dataset includes the important metadata fields, a full description of the dataset, potentially some related URL's, and a "search" button that points to an OpenSearch GUI that is pre-configured to search for granules within the dataset. We will present our experiences with converting NASA metadata into LOD, discuss the challenges, illustrate some of the enabled mashups, and demonstrate the latest version of the "instant browse" interface for navigating multiple metadata collections.
High-performance metadata indexing and search in petascale data storage systems

NASA Astrophysics Data System (ADS)

Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.

2008-07-01

Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.
Database integration in a multimedia-modeling environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dorow, Kevin E.

2002-09-02

Integration of data from disparate remote sources has direct applicability to modeling, which can support Brownfield assessments. To accomplish this task, a data integration framework needs to be established. A key element in this framework is the metadata that creates the relationship between the pieces of information that are important in the multimedia modeling environment and the information that is stored in the remote data source. The design philosophy is to allow modelers and database owners to collaborate by defining this metadata in such a way that allows interaction between their components. The main parts of this framework include toolsmore » to facilitate metadata definition, database extraction plan creation, automated extraction plan execution / data retrieval, and a central clearing house for metadata and modeling / database resources. Cross-platform compatibility (using Java) and standard communications protocols (http / https) allow these parts to run in a wide variety of computing environments (Local Area Networks, Internet, etc.), and, therefore, this framework provides many benefits. Because of the specific data relationships described in the metadata, the amount of data that have to be transferred is kept to a minimum (only the data that fulfill a specific request are provided as opposed to transferring the complete contents of a data source). This allows for real-time data extraction from the actual source. Also, the framework sets up collaborative responsibilities such that the different types of participants have control over the areas in which they have domain knowledge-the modelers are responsible for defining the data relevant to their models, while the database owners are responsible for mapping the contents of the database using the metadata definitions. Finally, the data extraction mechanism allows for the ability to control access to the data and what data are made available.« less
Sustained Assessment Metadata as a Pathway to Trustworthiness of Climate Science Information

NASA Astrophysics Data System (ADS)

Champion, S. M.; Kunkel, K.

2017-12-01

The Sustained Assessment process has produced a suite of climate change reports: The Third National Climate Assessment (NCA3), Regional Surface Climate Conditions in CMIP3 and CMIP5 for the United States: Differences, Similarities, and Implications for the U.S. National Climate Assessment, Impacts of Climate Change on Human Health in the United States: A Scientific Assessment, The State Climate Summaries, as well as the anticipated Climate Science Special Report and Fourth National Climate Assessment. Not only are these groundbreaking reports of climate change science, they are also the first suite of climate science reports to provide access to complex metadata directly connected to the report figures and graphics products. While the basic metadata documentation requirement is federally mandated through a series of federal guidelines as a part of the Information Quality Act, Sustained Assessment products are also deemed Highly Influential Scientific Assessments, which further requires demonstration of the transparency and reproducibility of the content. To meet these requirements, the Technical Support Unit (TSU) for the Sustained Assessment embarked on building a system for not only collecting and documenting metadata to the required standards, but one that also provides consumers unprecedented access to the underlying data and methods. As our process and documentation have evolved, the value of both continue to grow in parallel with the consumer expectation of quality, accessible climate science information. This presentation will detail the how the TSU accomplishes the mandated requirements with their metadata collection and documentation process, as well as the technical solution designed to demonstrate compliance while also providing access to the content for the general public. We will also illustrate how our accessibility platforms guide consumers through the Assessment science at a level of transparency that builds trust and confidence in the report content.
Improving Earth Science Metadata: Modernizing ncISO

NASA Astrophysics Data System (ADS)

O'Brien, K.; Schweitzer, R.; Neufeld, D.; Burger, E. F.; Signell, R. P.; Arms, S. C.; Wilcox, K.

2016-12-01

ncISO is a package of tools developed at NOAA's National Center for Environmental Information (NCEI) that facilitates the generation of ISO 19115-2 metadata from NetCDF data sources. The tool currently exists in two iterations: a command line utility and a web-accessible service within the THREDDS Data Server (TDS). Several projects, including NOAA's Unified Access Framework (UAF), depend upon ncISO to generate the ISO-compliant metadata from their data holdings and use the resulting information to populate discovery tools such as NCEI's ESRI Geoportal and NOAA's data.noaa.gov CKAN system. In addition to generating ISO 19115-2 metadata, the tool calculates a rubric score based on how well the dataset follows the Attribute Conventions for Dataset Discovery (ACDD). The result of this rubric calculation, along with information about what has been included and what is missing is displayed in an HTML document generated by the ncISO software package. Recently ncISO has fallen behind in terms of supporting updates to conventions such updates to the ACDD. With the blessing of the original programmer, NOAA's UAF has been working to modernize the ncISO software base. In addition to upgrading ncISO to utilize version1.3 of the ACDD, we have been working with partners at Unidata and IOOS to unify the tool's code base. In essence, we are merging the command line capabilities into the same software that will now be used by the TDS service, allowing easier updates when conventions such as ACDD are updated in the future. In this presentation, we will discuss the work the UAF project has done to support updated conventions within ncISO, as well as describe how the updated tool is helping to improve metadata throughout the earth and ocean sciences.
Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

NASA Astrophysics Data System (ADS)

Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

2007-12-01

Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
Metadata Creation, Management and Search System for your Scientific Data

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Palanisamy, G.

2012-12-01

Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);
Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs

PubMed Central

Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo

2014-01-01

Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677
Facilitating Stewardship of scientific data through standards based workflows

NASA Astrophysics Data System (ADS)

Bastrakova, I.; Kemp, C.; Potter, A. K.

2013-12-01

There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
Ensuring the Quality of Data Packages in the LTER Network Provenance Aware Synthesis Tracking Architecture Data Management System and Archive

NASA Astrophysics Data System (ADS)

Servilla, M. S.; O'Brien, M.; Costa, D.

2013-12-01

Considerable ecological research performed today occurs through the analysis of data downloaded from various repositories and archives, often resulting in derived or synthetic products generated by automated workflows. These data are only meaningful for research if they are well documented by metadata, lest semantic or data type errors may occur in interpretation or processing. The Long Term Ecological Research (LTER) Network now screens all data packages entering its long-term archive to ensure that each package contains metadata that is complete, of high quality, and accurately describes the structure of its associated data entity and the data are structurally congruent to the metadata. Screening occurs prior to the upload of a data package into the Provenance Aware Synthesis Tracking Architecture (PASTA) data management system through a series of quality checks, thus preventing ambiguously or incorrectly documented data packages from entering the system. The quality checks within PASTA are designed to work specifically with the Ecological Metadata Language (EML), the metadata standard adopted by the LTER Network to describe data generated by their 26 research sites. Each quality check is codified in Java as part of the ecological community-supported Data Manager Library, which is a resource of the EML specification and used as a component of the PASTA software stack. Quality checks test for metadata quality, data integrity, or metadata-data congruence. Quality checks are further classified as either conditional or informational. Conditional checks issue a 'valid', 'warning' or 'error' response. Only an 'error' response blocks the data package from upload into PASTA. Informational checks only provide descriptive content pertaining to a particular facet of the data package. Quality checks are designed by a group of LTER information managers and reviewed by the LTER community before deploying into PASTA. A total of 32 quality checks have been deployed to date. Quality checks can be customized through a configurable template, which includes turning checks 'on' or 'off' and setting the severity of conditional checks. This feature is important to other potential users of the Data Manager Library who wish to configure its quality checks in accordance with the standards of their community. Executing the complete set of quality checks produces a report that describes the result of each check. The report is an XML document that is stored by PASTA for future reference.

Panning for Gold: Utility of the World Wide Web for Metadata and Authority Control in Special Collections.

ERIC Educational Resources Information Center

Ellero, Nadine P.

2002-01-01

Describes the use of the World Wide Web as a name authority resource and tool for special collections' analytic-level cataloging, based on experiences at The Claude Moore Health Sciences Library. Highlights include primary documents and metadata; authority control and the Web as authority source information; and future possibilities. (Author/LRW)
ASIST 2003: Part III: Posters.

ERIC Educational Resources Information Center

Proceedings of the ASIST Annual Meeting, 2003

2003-01-01

Twenty-three posters address topics including access to information; metadata; personal information management; scholarly information communication; online resources; content analysis; interfaces; Web queries; information evaluation; informatics; information needs; search effectiveness; digital libraries; diversity; automated indexing; e-commerce;…
ASIST 2003: Part II: Panels.

ERIC Educational Resources Information Center

Proceedings of the ASIST Annual Meeting, 2003

2003-01-01

Forty-six panels address topics including women in information science; users and usability; information studies; reference services; information policies; standards; interface design; information retrieval; information networks; metadata; shared access; e-commerce in libraries; knowledge organization; information science theories; digitization;…
The MPO system for automatic workflow documentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abla, G.; Coviello, E. N.; Flanagan, S. M.

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
Online Metadata Directories: A way of preserving, sharing and discovering scientific information

NASA Technical Reports Server (NTRS)

Meaux, M.

2005-01-01

The Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth Science data and provides data holders a means to advertise their data to the community through its portals, i.e. online customized subset metadata directories. These directories are effectively serving communities like the Joint Committee on Antarctic Data Management (JCADM), the Global Observing System Information Center (GOSIC), and the Global Ocean Ecosystems Dynamic Program (GLOBEC) by increasing the visibility of their data holding. The purpose of the Gulf of Maine Ocean Data Partnership (GoMODP) is to "promote and coordinate the sharing, linking, electronic dissemination, and use of data on the Gulf of Maine region". The participants have decided that a "coordinated effort is needed to enable users throughout the Gulf of Maine region and beyond to discover and put to use the vast and growing quantities of data in their respective databases". GoMODP members have invited the GCMD to discuss potential collaborations associated with this effort. The presentation will focus on the use of the GCMD s metadata directory as a powerful tool for data discovery and sharing. An overview of the directory and its metadata authoring tools will be given.
The MPO system for automatic workflow documentation

DOE PAGES

Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

2016-04-18

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
A metadata approach for clinical data management in translational genomics studies in breast cancer.

PubMed

Papatheodorou, Irene; Crichton, Charles; Morris, Lorna; Maccallum, Peter; Davies, Jim; Brenton, James D; Caldas, Carlos

2009-11-30

In molecular profiling studies of cancer patients, experimental and clinical data are combined in order to understand the clinical heterogeneity of the disease: clinical information for each subject needs to be linked to tumour samples, macromolecules extracted, and experimental results. This may involve the integration of clinical data sets from several different sources: these data sets may employ different data definitions and some may be incomplete. In this work we employ semantic web techniques developed within the CancerGrid project, in particular the use of metadata elements and logic-based inference to annotate heterogeneous clinical information, integrate and query it. We show how this integration can be achieved automatically, following the declaration of appropriate metadata elements for each clinical data set; we demonstrate the practicality of this approach through application to experimental results and clinical data from five hospitals in the UK and Canada, undertaken as part of the METABRIC project (Molecular Taxonomy of Breast Cancer International Consortium). We describe a metadata approach for managing similarities and differences in clinical datasets in a standardized way that uses Common Data Elements (CDEs). We apply and evaluate the approach by integrating the five different clinical datasets of METABRIC.
CDGP, the data center for deep geothermal data from Alsace

NASA Astrophysics Data System (ADS)

Schaming, Marc; Grunberg, Marc; Jahn, Markus; Schmittbuhl, Jean; Cuenot, Nicolas; Genter, Albert; Dalmais, Eléonore

2016-04-01

CDGP (Centre de données de géothermie profonde, deep geothermal data center, http://cdgp.u-strasbg.fr) is set by the LabEX G-EAU-THERMIE PROFONDE to archive the high quality data collected in the Upper Rhine Graben geothermal sites and to distribute them to the scientific community for R&D activities, taking IPR (Intellectual Property Rights) into account. Collected datasets cover the whole life of geothermal projects, from exploration to drilling, stimulation, circulation and production. They originate from the Soultz-sous-Forêts pilot plant but also include more recent projects like the ECOGI project at Rittershoffen, Alsace, France. They are historically separated in two rather independent categories: geophysical datasets mostly related to the industrial management of the geothermal reservoir and seismological data from the seismic monitoring both during stimulations and circulations. Geophysical datasets are mainly up to now from the Soultz-sous-Forêts project that were stored on office's shelves and old digital media. Some inventories have been done recently, and a first step of the integration of these reservoir data into a PostgreSQL/postGIS database (ISO 19107 compatible) has been performed. The database links depths, temperatures, pressures, flows, for periods (times) and locations (geometries). Other geophysical data are still stored in structured directories as a data bank and need to be included in the database. Seismological datasets are of two kinds: seismological waveforms and seismicity bulletins; the former are stored in a standardized way both in format (miniSEED) and in files and directories structures (SDS) following international standard of the seismological community (FDSN), and the latter in a database following the open standard QuakeML. CDGP uses a cataloging application (GeoNetwork) to manage the metadata resources. It provides metadata editing and search functions as well as a web map viewer. The metadata editor supports ISO19115/119/110 standards used for spatial resources. A step forward will be to add specific metadata records as defined by the Open Geospatial Consortium to provide geophysical / geologic / reservoir information: Observations and Measurements (O&M) to describe the acquisition of information from a primary source, and SensorML to describe the sensors. Seismological metadata, which describe all the instrumental response, use the dateless SEED standard. Access to data will be handled in an additional step using geOrchestra spatial data infrastructure (SDI). Direct access will be granted after registration and validation using the single sign-on authentication system. Access to the data will also be granted via EPOS-IP Anthropogenic Hazards project. Access to episodes (time-correlated collections of geophysical, technological and other relevant geo-data over a geothermal area) and application of analysis (time- and technology-dependent probabilistic seismic hazard analysis, multi-hazard and multi-risk assessment) are services accessible via a portal and will require AAAI (Authentication, Authorization, Accounting and Identification).
System and method for integrating and accessing multiple data sources within a data warehouse architecture

DOEpatents

Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

2006-12-19

A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.
OAI and NASA's Scientific and Technical Information

NASA Technical Reports Server (NTRS)

Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.

2002-01-01

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an evolving protocol and philosophy regarding interoperability for digital libraries (DLs). Previously, "distributed searching" models were popular for DL interoperability. However, experience has shown distributed searching systems across large numbers of DLs to be difficult to maintain in an Internet environment. The OAI-PMH is a move away from distributed searching, focusing on the arguably simpler model of "metadata harvesting". We detail NASA s involvement in defining and testing the OAI-PMH and experience to date with adapting existing NASA distributed searching DLs (such as the NASA Technical Report Server) to use the OAI-PMH and metadata harvesting. We discuss some of the entirely new DL projects that the OAI-PMH has made possible, such as the Technical Report Interchange project. We explain the strategic importance of the OAI-PMH to the mission of NASA s Scientific and Technical Information Program.
WDS Knowledge Network Architecture in Support of International Science

NASA Astrophysics Data System (ADS)

Mokrane, M.; Minster, J. B. H.; Hugo, W.

2014-12-01

ICSU (International Council for Science) created the World Data System (WDS) as an interdisciplinary body at its General Assembly in Maputo in 2008, and since then the membership of the WDS has grown to include 86 members, of whom 56 are institutions or data centers focused on providing quality-assured data and services to the scientific community, and 10 more are entire networks of such data facilities and services. In addition to its objective of providing universal and equitable access to scientific data and services, WDS is also active in promoting stewardship, standards and conventions, and improved access to products derived from data and services. Whereas WDS is in process of aggregating and harmonizing the metadata collections of its membership, it is clear that additional benefits can be obtained by supplementing such traditional metadata sources with information about members, authors, and the coverages of the data, as well as metrics such as citation indices, quality indicators, and usability. Moreover, the relationships between the actors and systems that populate this metadata landscape can be seen as a knowledge network that describes a subset of global scientific endeavor. Such a knowledge network is useful in many ways, supporting both machine-based and human requests for contextual information related to a specific data set, institution, author, topic, or other entities in the network. Specific use cases that can be realized include decision and policy support for funding agencies, identification of collaborators, ranking of data sources, availability of data for specific coverages, and many more. The paper defines the scope of and conceptual background to such a knowledge network, discusses some initial work done by WDS to establish the network, and proposes an implementation model for rapid operationalization. In this model, established interests such as DataCite, ORCID, and CrossRef have well-defined roles, and the standards, services, and registries required to build a community-maintained, scalable knowledge network is presented. We conclude with a short discussion on feasibility and sustainability of global data infrastructure, the role of the WDS Knowledge Network in this infrastructure, and the necessary conditions for success.
Knowledge Network Architecture in Support of International Science

NASA Astrophysics Data System (ADS)

Hugo, Wim

2015-04-01

ICSU (The International Council for Science) created the World Data System (WDS) as an interdisciplinary body at its General Assembly in Maputo in 2008, and since then the membership of the WDS has grown to include 86 members, of whom 56 are institutions or data centres focused on providing quality-assured data and services to the scientific community. In addition to its objective of providing universal and equitable access to such data and services, WDS is also active in promoting stewardship, standards and conventions, and improved access to products derived from data and services. Whereas WDS is in process of aggregating and harmonizing the meta-data collections of its membership, it is clear that additional benefits can be obtained by supplementing such traditional meta-data sources with information about members, authors, and the coverages of the data, as well as metrics such as citation indices, quality indicators, and usability. Moreover, the relationships between the actors and systems that populate this meta-data landscape can be seen as a knowledge network that describes a sub-set of global scientific endeavor. Such a knowledge network is useful in many ways, supporting both machine-based and human requests for contextual information related to a specific data set, institution, author, topic, or other entities in the network. Specific use cases that can be realised include decision and policy support for funding agencies, identification of collaborators, ranking of data sources, availability of data for specific coverages, and many more. The paper defines the scope of and conceptual background to such a knowledge network, discusses some initial work done by WDS to establish the network, and proposes an implementation model for rapid operationalisation. In this model, established interests such as DataCITE, ORCID, and CrossRef have well-defined roles, and the standards, services, and registries required to build a community-maintained, scalable knowledge network is presented. We conclude with a short discussion on feasibility and sustainability of global data infrastructure, the role of the WDS Knowledge Network in this infrastructure, and the necessary conditions for success.
Enterprise Information Architecture for Mission Development

NASA Technical Reports Server (NTRS)

Dutra, Jayne

2007-01-01

This slide presentation reviews the concept of an information architecture to assist in mission development. The integrate information architecture will create a unified view of the information using metadata and the values (i.e., taxonomy).
Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.

PubMed

Good, Benjamin M; Tennis, Joseph T; Wilkinson, Mark D

2009-09-25

Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in the life sciences, more documents need to be tagged and more tags are needed for each document. These issues may be addressed both by finding ways to attract more users to current systems and by creating new user interfaces that encourage more collectively useful individual tagging behaviour.
Medical Content Searching, Retrieving, and Sharing Over the Internet: Lessons Learned From the mEducator Through a Scenario-Based Evaluation

PubMed Central

Spachos, Dimitris; Mylläri, Jarkko; Giordano, Daniela; Dafli, Eleni; Mitsopoulou, Evangelia; Schizas, Christos N; Pattichis, Constantinos; Nikolaidou, Maria

2015-01-01

Background The mEducator Best Practice Network (BPN) implemented and extended standards and reference models in e-learning to develop innovative frameworks as well as solutions that enable specialized state-of-the-art medical educational content to be discovered, retrieved, shared, and re-purposed across European Institutions, targeting medical students, doctors, educators and health care professionals. Scenario-based evaluation for usability testing, complemented with data from online questionnaires and field notes of users’ performance, was designed and utilized for the evaluation of these solutions. Objective The objective of this work is twofold: (1) to describe one instantiation of the mEducator BPN solutions (mEducator3.0 - “MEdical Education LINnked Arena” MELINA+) with a focus on the metadata schema used, as well as on other aspects of the system that pertain to usability and acceptance, and (2) to present evaluation results on the suitability of the proposed metadata schema for searching, retrieving, and sharing of medical content and with respect to the overall usability and acceptance of the system from the target users. Methods A comprehensive evaluation methodology framework was developed and applied to four case studies, which were conducted in four different countries (ie, Greece, Cyprus, Bulgaria and Romania), with a total of 126 participants. In these case studies, scenarios referring to creating, sharing, and retrieving medical educational content using mEducator3.0 were used. The data were collected through two online questionnaires, consisting of 36 closed-ended questions and two open-ended questions that referred to mEducator 3.0 and through the use of field notes during scenario-based evaluations. Results The main findings of the study showed that even though the informational needs of the mEducator target groups were addressed to a satisfactory extent and the metadata schema supported content creation, sharing, and retrieval from an end-user perspective, users faced difficulties in achieving a shared understanding of the meaning of some metadata fields and in correctly managing the intellectual property rights of repurposed content. Conclusions The results of this evaluation impact researchers, medical professionals, and designers interested in using similar systems for educational content sharing in medical and other domains. Recommendations on how to improve the search, retrieval, identification, and obtaining of medical resources are provided, by addressing issues of content description metadata, content description procedures, and intellectual property rights for re-purposed content. PMID:26453250
Metadata Standard and Data Exchange Specifications to Describe, Model, and Integrate Complex and Diverse High-Throughput Screening Data from the Library of Integrated Network-based Cellular Signatures (LINCS).

PubMed

Vempati, Uma D; Chung, Caty; Mader, Chris; Koleti, Amar; Datar, Nakul; Vidović, Dušica; Wrobel, David; Erickson, Sean; Muhlich, Jeremy L; Berriz, Gabriel; Benes, Cyril H; Subramanian, Aravind; Pillai, Ajay; Shamu, Caroline E; Schürer, Stephan C

2014-06-01

The National Institutes of Health Library of Integrated Network-based Cellular Signatures (LINCS) program is generating extensive multidimensional data sets, including biochemical, genome-wide transcriptional, and phenotypic cellular response signatures to a variety of small-molecule and genetic perturbations with the goal of creating a sustainable, widely applicable, and readily accessible systems biology knowledge resource. Integration and analysis of diverse LINCS data sets depend on the availability of sufficient metadata to describe the assays and screening results and on their syntactic, structural, and semantic consistency. Here we report metadata specifications for the most important molecular and cellular components and recommend them for adoption beyond the LINCS project. We focus on the minimum required information to model LINCS assays and results based on a number of use cases, and we recommend controlled terminologies and ontologies to annotate assays with syntactic consistency and semantic integrity. We also report specifications for a simple annotation format (SAF) to describe assays and screening results based on our metadata specifications with explicit controlled vocabularies. SAF specifically serves to programmatically access and exchange LINCS data as a prerequisite for a distributed information management infrastructure. We applied the metadata specifications to annotate large numbers of LINCS cell lines, proteins, and small molecules. The resources generated and presented here are freely available. © 2014 Society for Laboratory Automation and Screening.
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data

PubMed Central

Larralde, Martin; Lawson, Thomas N.; Weber, Ralf J. M.; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R.; Steinbeck, Christoph; Salek, Reza M.

2017-01-01

Abstract Summary Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. Availability and Implementation mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. Contact reza.salek@ebi.ac.uk or isatools@googlegroups.com Supplementary information Supplementary data are available at Bioinformatics online. PMID:28402395
docBUILDER - Building Your Useful Metadata for Earth Science Data and Services.

NASA Astrophysics Data System (ADS)

Weir, H. M.; Pollack, J.; Olsen, L. M.; Major, G. R.

2005-12-01

The docBUILDER tool, created by NASA's Global Change Master Directory (GCMD), assists the scientific community in efficiently creating quality data and services metadata. Metadata authors are asked to complete five required fields to ensure enough information is provided for users to discover the data and related services they seek. After the metadata record is submitted to the GCMD, it is reviewed for semantic and syntactic consistency. Currently, two versions are available - a Web-based tool accessible with most browsers (docBUILDERweb) and a stand-alone desktop application (docBUILDERsolo). The Web version is available through the GCMD website, at http://gcmd.nasa.gov/User/authoring.html. This version has been updated and now offers: personalized templates to ease entering similar information for multiple data sets/services; automatic population of Data Center/Service Provider URLs based on the selected center/provider; three-color support to indicate required, recommended, and optional fields; an editable text window containing the XML record, to allow for quick editing; and improved overall performance and presentation. The docBUILDERsolo version offers the ability to create metadata records on a computer wherever you are. Except for installation and the occasional update of keywords, data/service providers are not required to have an Internet connection. This freedom will allow users with portable computers (Windows, Mac, and Linux) to create records in field campaigns, whether in Antarctica or the Australian Outback. This version also offers a spell-checker, in addition to all of the features found in the Web version.
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE PAGES

Dasari, Venkat R.; Humble, Travis S.

2016-10-10

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dasari, Venkat R.; Humble, Travis S.

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less

File level metadata generation and use for diverse airborne and in situ data: Experiences with Operation IceBridge and SnowEx

NASA Astrophysics Data System (ADS)

Tanner, S.; Schwab, M.; Beam, K.; Skaug, M.

2017-12-01

Operation IceBridge has been flying campaigns in the Arctic and Antarctic for nearly 10 years and will soon be a decadal mission. During that time, the generation and use of file level metadata has evolved from nearly non-existent to robust spatio-temporal support. This evolution has been difficult at times, but the results speak for themselves in the form of production tools for search, discovery, access and analysis. The lessons learned from this experience are now being incorporated into SnowEx, a new mission to measure snow cover using airborne and ground-based measurements. This presentation will focus on techniques for generating metadata for such a diverse set of measurements as well as the resulting tools that utilize this information. This includes the development and deployment of MetGen, a semi-automated metadata generation capability that relies on collaboration between data producers and data archivers, the newly deployed IceBridge data portal which incorporates data browse capabilities and limited in-line analysis, and programmatic access to metadata and data for incorporation into larger automated workflows.
Automatic meta-data collection of STP observation data

NASA Astrophysics Data System (ADS)

Ishikura, S.; Kimura, E.; Murata, K.; Kubo, T.; Shinohara, I.

2006-12-01

For the geo-science and the STP (Solar-Terrestrial Physics) studies, various observations have been done by satellites and ground-based observatories up to now. These data are saved and managed at many organizations, but no common procedure and rule to provide and/or share these data files. Researchers have felt difficulty in searching and analyzing such different types of data distributed over the Internet. To support such cross-over analyses of observation data, we have developed the STARS (Solar-Terrestrial data Analysis and Reference System). The STARS consists of client application (STARS-app), the meta-database (STARS- DB), the portal Web service (STARS-WS) and the download agent Web service (STARS DLAgent-WS). The STARS-DB includes directory information, access permission, protocol information to retrieve data files, hierarchy information of mission/team/data and user information. Users of the STARS are able to download observation data files without knowing locations of the files by using the STARS-DB. We have implemented the Portal-WS to retrieve meta-data from the meta-database. One reason we use the Web service is to overcome a variety of firewall restrictions which is getting stricter in recent years. Now it is difficult for the STARS client application to access to the STARS-DB by sending SQL query to obtain meta- data from the STARS-DB. Using the Web service, we succeeded in placing the STARS-DB behind the Portal- WS and prevent from exposing it on the Internet. The STARS accesses to the Portal-WS by sending the SOAP (Simple Object Access Protocol) request over HTTP. Meta-data is received as a SOAP Response. The STARS DLAgent-WS provides clients with data files downloaded from data sites. The data files are provided with a variety of protocols (e.g., FTP, HTTP, FTPS and SFTP). These protocols are individually selected at each site. The clients send a SOAP request with download request messages and receive observation data files as a SOAP Response with DIME-Attachment. By introducing the DLAgent-WS, we overcame the problem that the data management policies of each data site are independent. Another important issue to be overcome is how to collect the meta-data of observation data files. So far, STARS-DB managers have added new records to the meta-database and updated them manually. We have had a lot of troubles to maintain the meta-database because observation data are generated every day and the quantity of data files increases explosively. For that purpose, we have attempted to automate collection of the meta-data. In this research, we adopted the RSS 1.0 (RDF Site Summary) as a format to exchange meta-data in the STP fields. The RSS is an RDF vocabulary that provides a multipurpose extensible meta-data description and is suitable for syndication of meta-data. Most of the data in the present study are described in the CDF (Common Data Format), which is a self- describing data format. We have converted meta-information extracted from the CDF data files into RSS files. The program to generate the RSS files is executed on data site server once a day and the RSS files provide information of new data files. The RSS files are collected by RSS collection server once a day and the meta- data are stored in the STARS-DB.
CINERGI: Community Inventory of EarthCube Resources for Geoscience Interoperability

NASA Astrophysics Data System (ADS)

Zaslavsky, Ilya; Bermudez, Luis; Grethe, Jeffrey; Gupta, Amarnath; Hsu, Leslie; Lehnert, Kerstin; Malik, Tanu; Richard, Stephen; Valentine, David; Whitenack, Thomas

2014-05-01

Organizing geoscience data resources to support cross-disciplinary data discovery, interpretation, analysis and integration is challenging because of different information models, semantic frameworks, metadata profiles, catalogs, and services used in different geoscience domains, not to mention different research paradigms and methodologies. The central goal of CINERGI, a new project supported by the US National Science Foundation through its EarthCube Building Blocks program, is to create a methodology and assemble a large inventory of high-quality information resources capable of supporting data discovery needs of researchers in a wide range of geoscience domains. The key characteristics of the inventory are: 1) collaboration with and integration of metadata resources from a number of large data facilities; 2) reliance on international metadata and catalog service standards; 3) assessment of resource "interoperability-readiness"; 4) ability to cross-link and navigate data resources, projects, models, researcher directories, publications, usage information, etc.; 5) efficient inclusion of "long-tail" data, which are not appearing in existing domain repositories; 6) data registration at feature level where appropriate, in addition to common dataset-level registration, and 7) integration with parallel EarthCube efforts, in particular focused on EarthCube governance, information brokering, service-oriented architecture design and management of semantic information. We discuss challenges associated with accomplishing CINERGI goals, including defining the inventory scope; managing different granularity levels of resource registration; interaction with search systems of domain repositories; explicating domain semantics; metadata brokering, harvesting and pruning; managing provenance of the harvested metadata; and cross-linking resources based on the linked open data (LOD) approaches. At the higher level of the inventory, we register domain-wide resources such as domain catalogs, vocabularies, information models, data service specifications, identifier systems, and assess their conformance with international standards (such as those adopted by ISO and OGC, and used by INSPIRE) or de facto community standards using, in part, automatic validation techniques. The main level in CINERGI leverages a metadata aggregation platform (currently Geoportal Server) to organize harvested resources from multiple collections and contributed by community members during EarthCube end-user domain workshops or suggested online. The latter mechanism uses the SciCrunch toolkit originally developed within the Neuroscience Information Framework (NIF) project and now being extended to other communities. The inventory is designed to support requests such as "Find resources with theme X in geographic area S", "Find datasets with subject Y using query concept expansion", "Find geographic regions having data of type Z", "Find datasets that contain property P". With the added LOD support, additional types of requests, such as "Find example implementations of specification X", "Find researchers who have worked in Domain X, dataset Y, location L", "Find resources annotated by person X", will be supported. Project's website (http://workspace.earthcube.org/cinergi) provides access to the initial resource inventory, a gallery of EarthCube researchers, collections of geoscience models, metadata entry forms, and other software modules and inventories being integrated into the CINERGI system. Support from the US National Science Foundation under award NSF ICER-1343816 is gratefully acknowledged.
Operational Interoperability Challenges on the Example of GEOSS and WIS

NASA Astrophysics Data System (ADS)

Heene, M.; Buesselberg, T.; Schroeder, D.; Brotzer, A.; Nativi, S.

2015-12-01

The following poster highlights the operational interoperability challenges on the example of Global Earth Observation System of Systems (GEOSS) and World Meteorological Organization Information System (WIS). At the heart of both systems is a catalogue of earth observation data, products and services but with different metadata management concepts. While in WIS a strong governance with an own metadata profile for the hundreds of thousands metadata records exists, GEOSS adopted a more open approach for the ten million records. Furthermore, the development of WIS - as an operational system - follows a roadmap with committed downwards compatibility while the GEOSS development process is more agile. The poster discusses how the interoperability can be reached for the different metadata management concepts and how a proxy concept helps to couple two different systems which follow a different development methodology. Furthermore, the poster highlights the importance of monitoring and backup concepts as a verification method for operational interoperability.
New Solutions for Enabling Discovery of User-Centric Virtual Data Products in NASA's Common Metadata Repository

NASA Astrophysics Data System (ADS)

Pilone, D.; Gilman, J.; Baynes, K.; Shum, D.

2015-12-01

This talk introduces a new NASA Earth Observing System Data and Information System (EOSDIS) capability to automatically generate and maintain derived, Virtual Product information allowing DAACs and Data Providers to create tailored and more discoverable variations of their products. After this talk the audience will be aware of the new EOSDIS Virtual Product capability, applications of it, and how to take advantage of it. Much of the data made available in the EOSDIS are organized for generation and archival rather than for discovery and use. The EOSDIS Common Metadata Repository (CMR) is launching a new capability providing automated generation and maintenance of user-oriented Virtual Product information. DAACs can easily surface variations on established data products tailored to specific uses cases and users, leveraging DAAC exposed services such as custom ordering or access services like OPeNDAP for on-demand product generation and distribution. Virtual Data Products enjoy support for spatial and temporal information, keyword discovery, association with imagery, and are fully discoverable by tools such as NASA Earthdata Search, Worldview, and Reverb. Virtual Product generation has applicability across many use cases: - Describing derived products such as Surface Kinetic Temperature information (AST_08) from source products (ASTER L1A) - Providing streamlined access to data products (e.g. AIRS) containing many (>800) data variables covering an enormous variety of physical measurements - Attaching additional EOSDIS offerings such as Visual Metadata, external services, and documentation metadata - Publishing alternate formats for a product (e.g. netCDF for HDF products) with the actual conversion happening on request - Publishing granules to be modified by on-the-fly services, like GES-DISC's Data Quality Screening Service - Publishing "bundled" products where granules from one product correspond to granules from one or more other related products
Data Citation in Neuroimaging: Proposed Best Practices for Data Identification and Attribution

PubMed Central

Honor, Leah B.; Haselgrove, Christian; Frazier, Jean A.; Kennedy, David N.

2016-01-01

Data sharing and reuse, while widely accepted as good ideas, have been slow to catch on in any concrete and consistent way. One major hurdle within the scientific community has been the lack of widely accepted standards for citing that data, making it difficult to track usage and measure impact. Within the neuroimaging community, there is a need for a way to not only clearly identify and cite datasets, but also to derive new aggregate sets from multiple sources while clearly maintaining lines of attribution. This work presents a functional prototype of a system to integrate Digital Object Identifiers (DOI) and a standardized metadata schema into a XNAT-based repository workflow, allowing for identification of data at both the project and image level. These item and source level identifiers allow any newly defined combination of images, from any number of projects, to be tagged with a new group-level DOI that automatically inherits the individual attributes and provenance information of its constituent parts. This system enables the tracking of data reuse down to the level of individual images. The implementation of this type of data identification system would impact researchers and data creators, data hosting facilities, and data publishers, but the benefit of having widely accepted standards for data identification and attribution would go far toward making data citation practical and advantageous. PMID:27570508
Data Citation in Neuroimaging: Proposed Best Practices for Data Identification and Attribution.

PubMed

Honor, Leah B; Haselgrove, Christian; Frazier, Jean A; Kennedy, David N

2016-01-01

Data sharing and reuse, while widely accepted as good ideas, have been slow to catch on in any concrete and consistent way. One major hurdle within the scientific community has been the lack of widely accepted standards for citing that data, making it difficult to track usage and measure impact. Within the neuroimaging community, there is a need for a way to not only clearly identify and cite datasets, but also to derive new aggregate sets from multiple sources while clearly maintaining lines of attribution. This work presents a functional prototype of a system to integrate Digital Object Identifiers (DOI) and a standardized metadata schema into a XNAT-based repository workflow, allowing for identification of data at both the project and image level. These item and source level identifiers allow any newly defined combination of images, from any number of projects, to be tagged with a new group-level DOI that automatically inherits the individual attributes and provenance information of its constituent parts. This system enables the tracking of data reuse down to the level of individual images. The implementation of this type of data identification system would impact researchers and data creators, data hosting facilities, and data publishers, but the benefit of having widely accepted standards for data identification and attribution would go far toward making data citation practical and advantageous.
panMetaDocs, eSciDoc, and DOIDB - an infrastructure for the curation and publication of file-based datasets for 'GFZ Data Services'

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Elger, Kirsten; Bertelmann, Roland; Klump, Jens

2016-04-01

With the foundation of DataCite in 2009 and the technical infrastructure installed in the last six years it has become very easy to create citable dataset DOIs. Nowadays, dataset DOIs are increasingly accepted and required by journals in reference lists of manuscripts. In addition, DataCite provides usage statistics [1] of assigned DOIs and offers a public search API to make research data count. By linking related information to the data, they become more useful for future generations of scientists. For this purpose, several identifier systems, as ISBN for books, ISSN for journals, DOI for articles or related data, Orcid for authors, and IGSN for physical samples can be attached to DOIs using the DataCite metadata schema [2]. While these are good preconditions to publish data, free and open solutions that help with the curation of data, the publication of research data, and the assignment of DOIs in one software seem to be rare. At GFZ Potsdam we built a modular software stack that is made of several free and open software solutions and we established 'GFZ Data Services'. 'GFZ Data Services' provides storage, a metadata editor for publication and a facility to moderate minted DOIs. All software solutions are connected through web APIs, which makes it possible to reuse and integrate established software. Core component of 'GFZ Data Services' is an eSciDoc [3] middleware that is used as central storage, and has been designed along the OAIS reference model for digital preservation. Thus, data are stored in self-contained packages that are made of binary file-based data and XML-based metadata. The eSciDoc infrastructure provides access control to data and it is able to handle half-open datasets, which is useful in embargo situations when a subset of the research data are released after an adequate period. The data exchange platform panMetaDocs [4] makes use of eSciDoc's REST API to upload file-based data into eSciDoc and uses a metadata editor [5] to annotate the files with metadata. The metadata editor has a user-friendly interface with nominal lists, extensive explanations, and an interactive mapping tool to provide assistance to scientists describing the data. It is possible to deposit metadata templates to fill certain fields with default values. The metadata editor generates metadata in the schemas ISO19139, NASA GCMD DIF, and DataCite and could be extended for other schemas. panMetaDocs is able to mint dataset DOIs through DOIDB, which is our component to moderate dataset DOIs issued through 'GFZ Data Services'. DOIDB accepts metadata in the schemas ISO19139, DIF, and DataCite. In addition, DOIDB provides an OAI-PMH interface to disseminate all deposited metadata to data portals. The presentation of datasets on DOI landing pages is done though XSLT stylesheet transformation of the XML-based metadata. The landing pages have been designed to meet needs of scientists. We are able to render the metadata to different layouts. Furthermore, additional information about datasets and publications is assembled into the webpage by querying public databases on the internet. The work presented here will focus on technical details of the software stack. [1] http://stats.datacite.org [2] http://www.dlib.org/dlib/january11/starr/01starr.html [3] http://www.escidoc.org [4] http://panmetadocs.sf.net [5] http://github.com/ulbricht
Scientific Workflows + Provenance = Better (Meta-)Data Management

NASA Astrophysics Data System (ADS)

Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

2013-12-01

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
Joint Data Analysis in Nutritional Epidemiology: Identification of Observational Studies and Minimal Requirements.

PubMed

Pinart, Mariona; Nimptsch, Katharina; Bouwman, Jildau; Dragsted, Lars O; Yang, Chen; De Cock, Nathalie; Lachat, Carl; Perozzi, Giuditta; Canali, Raffaella; Lombardo, Rosario; D'Archivio, Massimo; Guillaume, Michèle; Donneau, Anne-Françoise; Jeran, Stephanie; Linseisen, Jakob; Kleiser, Christina; Nöthlings, Ute; Barbaresko, Janett; Boeing, Heiner; Stelmach-Mardas, Marta; Heuer, Thorsten; Laird, Eamon; Walton, Janette; Gasparini, Paolo; Robino, Antonietta; Castaño, Luis; Rojo-Martínez, Gemma; Merino, Jordi; Masana, Luis; Standl, Marie; Schulz, Holger; Biagi, Elena; Nurk, Eha; Matthys, Christophe; Gobbetti, Marco; de Angelis, Maria; Windler, Eberhard; Zyriax, Birgit-Christiane; Tafforeau, Jean; Pischon, Tobias

2018-02-01

Joint data analysis from multiple nutrition studies may improve the ability to answer complex questions regarding the role of nutritional status and diet in health and disease. The objective was to identify nutritional observational studies from partners participating in the European Nutritional Phenotype Assessment and Data Sharing Initiative (ENPADASI) Consortium, as well as minimal requirements for joint data analysis. A predefined template containing information on study design, exposure measurements (dietary intake, alcohol and tobacco consumption, physical activity, sedentary behavior, anthropometric measures, and sociodemographic and health status), main health-related outcomes, and laboratory measurements (traditional and omics biomarkers) was developed and circulated to those European research groups participating in the ENPADASI under the strategic research area of "diet-related chronic diseases." Information about raw data disposition and metadata sharing was requested. A set of minimal requirements was abstracted from the gathered information. Studies (12 cohort, 12 cross-sectional, and 2 case-control) were identified. Two studies recruited children only and the rest recruited adults. All studies included dietary intake data. Twenty studies collected blood samples. Data on traditional biomarkers were available for 20 studies, of which 17 measured lipoproteins, glucose, and insulin and 13 measured inflammatory biomarkers. Metabolomics, proteomics, and genomics or transcriptomics data were available in 5, 3, and 12 studies, respectively. Although the study authors were willing to share metadata, most refused, were hesitant, or had legal or ethical issues related to sharing raw data. Forty-one descriptors of minimal requirements for the study data were identified to facilitate data integration. Combining study data sets will enable sufficiently powered, refined investigations to increase the knowledge and understanding of the relation between food, nutrition, and human health. Furthermore, the minimal requirements for study data may encourage more efficient secondary usage of existing data and provide sufficient information for researchers to draft future multicenter research proposals in nutrition.
The role of digital sample information within the digital geoscience infrastructure: a pragmatic approach

NASA Astrophysics Data System (ADS)

Howe, Michael

2014-05-01

Much of the digital geological information on the composition, properties and dynamics of the subsurface is based ultimately on physical samples, many of which are archived to provide a basis for the information. Online metadata catalogues of these collections have now been available for many years. Many of these are institutional and tightly focussed, with UK examples including the British Geological Survey's (BGS) palaeontological samples database, PalaeoSaurus (http://www.bgs.ac.uk/palaeosaurus/), and mineralogical and petrological sample database, Britrocks (http://www.bgs.ac.uk/data/britrocks.html) . There are now a growing number of international sample metadata databases, including The Palaeobiology Database (http://paleobiodb.org/) and SESAR, the IGSN (International Geo Sample Number) database (http://www.geosamples.org/catalogsearch/ ). More recently the emphasis has moved beyond metadata (locality, identification, age, citations, etc) to digital imagery, with the intention of providing the user with at least enough information to determine whether viewing the sample would be worthwhile. Recent BGS examples include high resolution (e.g. 7216 x 5412 pixel) hydrocarbon well core images (http://www.bgs.ac.uk/data/offshoreWells/wells.cfc?method=searchWells) , high resolution rock thin section images (e.g. http://www.largeimages.bgs.ac.uk/iip/britrocks.html?id=290000/291739 ) and building stone images (http://geoscenic.bgs.ac.uk/asset-bank/action/browseItems?categoryId=1547&categoryTypeId=1) . This has been developed further with high resolution stereo images. The Jisc funded GB3D type fossils online project delivers these as red-cyan anaglyphs (http://www.3d-fossils.ac.uk/). More innovatively, the GB3D type fossils project has laser scanned several thousand type fossils and the resulting 3d-digital models are now being delivered through the online portal. Importantly, this project also represents collaboration between the BGS, Oxford and Cambridge Universities, the National Museums of Wales, and numerous other national, local and regional museums. The lack of currently accepted international standards and infrastructures for the delivery of high resolution images and 3d-digital models has necessitated the BGS in developing or selecting its own. Most high resolution images have been delivered using the JPEG 2000 format because of its quality and speed. Digital models have been made available in both .PLY and .OBJ format because of their respective efficient file size, and flexibility. Consideration must now be given to European and international standards and infrastructures for the delivery of high resolution images and 3d-digital models.
Tidying Up International Nucleotide Sequence Databases: Ecological, Geographical and Sequence Quality Annotation of ITS Sequences of Mycorrhizal Fungi

PubMed Central

Tedersoo, Leho; Abarenkov, Kessy; Nilsson, R. Henrik; Schüssler, Arthur; Grelet, Gwen-Aëlle; Kohout, Petr; Oja, Jane; Bonito, Gregory M.; Veldre, Vilmar; Jairus, Teele; Ryberg, Martin; Larsson, Karl-Henrik; Kõljalg, Urmas

2011-01-01

Sequence analysis of the ribosomal RNA operon, particularly the internal transcribed spacer (ITS) region, provides a powerful tool for identification of mycorrhizal fungi. The sequence data deposited in the International Nucleotide Sequence Databases (INSD) are, however, unfiltered for quality and are often poorly annotated with metadata. To detect chimeric and low-quality sequences and assign the ectomycorrhizal fungi to phylogenetic lineages, fungal ITS sequences were downloaded from INSD, aligned within family-level groups, and examined through phylogenetic analyses and BLAST searches. By combining the fungal sequence database UNITE and the annotation and search tool PlutoF, we also added metadata from the literature to these accessions. Altogether 35,632 sequences belonged to mycorrhizal fungi or originated from ericoid and orchid mycorrhizal roots. Of these sequences, 677 were considered chimeric and 2,174 of low read quality. Information detailing country of collection, geographical coordinates, interacting taxon and isolation source were supplemented to cover 78.0%, 33.0%, 41.7% and 96.4% of the sequences, respectively. These annotated sequences are publicly available via UNITE (http://unite.ut.ee/) for downstream biogeographic, ecological and taxonomic analyses. In European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/), the annotated sequences have a special link-out to UNITE. We intend to expand the data annotation to additional genes and all taxonomic groups and functional guilds of fungi. PMID:21949797
EnviroAtlas Tree Cover Configuration and Connectivity, Water Background Web Service

EPA Pesticide Factsheets

This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://www.epa.gov/enviroatlas). The 1-meter resolution tree cover configuration and connectivity map categorizes tree cover into structural elements (e.g. core, edge, connector, etc.). Source imagery varies by community. For specific information about methods and accuracy of each community's tree cover configuration and connectivity classification, consult their individual metadata records: Austin, TX (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B29D2B039-905C-4825-B0B4-9315122D6A9F%7D); Cleveland, OH (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B03cd54e1-4328-402e-ba75-e198ea9fbdc7%7D); Des Moines, IA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B350A83E6-10A2-4D5D-97E6-F7F368D268BB%7D); Durham, NC (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BC337BA5F-8275-4BA8-9647-F63C443F317D%7D); Fresno, CA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B84B98749-9C1C-4679-AE24-9B9C0998EBA5%7D); Green Bay, WI (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B69E48A44-3D30-4E84-A764-38FBDCCAC3D0%7D); Memphis, TN (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BB7313ADA-04F7-4D80-ABBA-77E753AAD002%7D); Milwaukee, WI (https://edg.epa.gov/metadata/catalog/search/resource/details.page?u
OlyMPUS - The Ontology-based Metadata Portal for Unified Semantics

NASA Astrophysics Data System (ADS)

Huffer, E.; Gleason, J. L.

2015-12-01

The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS leverages the semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated interface for producing the semantically rich metadata needed to support ODISEES' data discovery and access services. It integrates the ODISEES metadata search system with multiple NASA data delivery tools to enable data consumers to create customized data sets for download to their computers, or for NASA Advanced Supercomputing (NAS) facility registered users, directly to NAS storage resources for access by applications running on NAS supercomputers. A core function of NASA's Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform complex analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many Earth science data products are disparate and hard to synthesize. Variations in how data are collected, processed, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. Robust, semantically rich metadata can support tools for data discovery and facilitate machine-to-machine transactions with services such as data subsetting, regridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA's strategic plans. However, as metadata requirements increase and competing standards emerge, metadata provisioning becomes increasingly burdensome to data producers. The OlyMPUS system helps data providers produce semantically rich metadata, making their data more accessible to data consumers, and helps data consumers quickly discover and download the right data for their research.
ENVIRONMENTAL INFORMATION MANAGEMENT SYSTEM (EIMS)

EPA Science Inventory

The Environmental Information Management System (EIMS) organizes descriptive information (metadata) for data sets, databases, documents, models, projects, and spatial data. The EIMS design provides a repository for scientific documentation that can be easily accessed with standar...
A Common Metadata System for Marine Data Portals

NASA Astrophysics Data System (ADS)

Wosniok, C.; Breitbach, G.; Lehfeldt, R.

2012-04-01

Processing and allocation of marine datasets depend on the nature of the data resulting from field campaigns, continuous monitoring and numerical modeling. Two research and development projects in northern Germany manage different types of marine data. Due to different data characteristics and institutional frameworks separate data portals are required. This paper describes the integration of distributed marine data in Germany. The Marine Data Infrastructure of Germany (MDI-DE) supports public authorities in the German coastal zone with the implementation of European directives like INSPIRE or the Marine Strategy Framework Directive. This is carried out through setting up standardized web services within a network of participating coastal agencies and the installation of a common data portal (http://www.mdi-de.org), which integrates distributed marine data concerning coastal engineering, coastal water protection and nature conservation in an interoperable and harmonized manner for administrative and scientific purposes as well as for information of the general public. The Coastal Observation System for Northern and Arctic Seas (COSYNA) aims at developing and testing analysis systems for the operational synoptic description of the environmental status of the North Sea and of Arctic coastal waters. This is done by establishing a network of monitoring facilities and the provision of its data in near-real-time. In situ measurements with poles, ferry boxes, and buoys, together with remote sensing measurements, and the data assimilation of these data into simulation results enables COSYNA to provide pre-operational 'products', that are beyond the present routinely applied techniques in observation and modelling. The data allocation in near-real-time requires thoroughly executed data validation, which is processed on the fly before data is passed on to the COSYNA portal (http://kofserver2.hzg.de/codm/). Both projects apply OGC standards such as Web Mapping Service (WMS), Web Feature Service (WFS) and Sensor Observation Service (SOS), which ensures interoperability and extensibility. In addition, metadata as crucial components for searching and finding information in large data infrastructures is provided via the Catalogue Web Service (CS-W). MDI-DE and COSYNA rely on the metadata information system for marine metadata NOKIS, which reflects a metadata profile tailored for marine data according to the specifications of German coastal authorities. In spite of this common software base, interoperability between the two data collections requires constant alignments of the diverse data processed by the two portals. While monitoring data in the MDI-DE is currently rather campaign-based, COSYNA has to fit constantly evolving time series into metadata sets. With all data following the same metadata profile, we now reach full interoperability between the different data collections. The distributed marine information system provides options to search, find and visualise the harmonised results from continuous monitoring, field campaigns, numerical modeling and other data in one web client.
A Framework for Collaborative Review of Candidate Events in High Data Rate Streams: the V-Fastr Experiment as a Case Study

NASA Astrophysics Data System (ADS)

Hart, Andrew F.; Cinquini, Luca; Khudikyan, Shakeh E.; Thompson, David R.; Mattmann, Chris A.; Wagstaff, Kiri; Lazio, Joseph; Jones, Dayton

2015-01-01

“Fast radio transients” are defined here as bright millisecond pulses of radio-frequency energy. These short-duration pulses can be produced by known objects such as pulsars or potentially by more exotic objects such as evaporating black holes. The identification and verification of such an event would be of great scientific value. This is one major goal of the Very Long Baseline Array (VLBA) Fast Transient Experiment (V-FASTR), a software-based detection system installed at the VLBA. V-FASTR uses a “commensal” (piggy-back) approach, analyzing all array data continually during routine VLBA observations and identifying candidate fast transient events. Raw data can be stored from a buffer memory, which enables a comprehensive off-line analysis. This is invaluable for validating the astrophysical origin of any detection. Candidates discovered by the automatic system must be reviewed each day by analysts to identify any promising signals that warrant a more in-depth investigation. To support the timely analysis of fast transient detection candidates by V-FASTR scientists, we have developed a metadata-driven, collaborative candidate review framework. The framework consists of a software pipeline for metadata processing composed of both open source software components and project-specific code written expressly to extract and catalog metadata from the incoming V-FASTR data products, and a web-based data portal that facilitates browsing and inspection of the available metadata for candidate events extracted from the VLBA radio data.
An asynchronous traversal engine for graph-based rich metadata management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Carns, Philip; Ross, Robert B.

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
A New Browser-based, Ontology-driven Tool for Generating Standardized, Deep Descriptions of Geoscience Models

NASA Astrophysics Data System (ADS)

Peckham, S. D.; Kelbert, A.; Rudan, S.; Stoica, M.

2016-12-01

Standardized metadata for models is the key to reliable and greatly simplified coupling in model coupling frameworks like CSDMS (Community Surface Dynamics Modeling System). This model metadata also helps model users to understand the important details that underpin computational models and to compare the capabilities of different models. These details include simplifying assumptions on the physics, governing equations and the numerical methods used to solve them, discretization of space (the grid) and time (the time-stepping scheme), state variables (input or output), model configuration parameters. This kind of metadata provides a "deep description" of a computational model that goes well beyond other types of metadata (e.g. author, purpose, scientific domain, programming language, digital rights, provenance, execution) and captures the science that underpins a model. While having this kind of standardized metadata for each model in a repository opens up a wide range of exciting possibilities, it is difficult to collect this information and a carefully conceived "data model" or schema is needed to store it. Automated harvesting and scraping methods can provide some useful information, but they often result in metadata that is inaccurate or incomplete, and this is not sufficient to enable the desired capabilities. In order to address this problem, we have developed a browser-based tool called the MCM Tool (Model Component Metadata) which runs on notebooks, tablets and smart phones. This tool was partially inspired by the TurboTax software, which greatly simplifies the necessary task of preparing tax documents. It allows a model developer or advanced user to provide a standardized, deep description of a computational geoscience model, including hydrologic models. Under the hood, the tool uses a new ontology for models built on the CSDMS Standard Names, expressed as a collection of RDF files (Resource Description Framework). This ontology is based on core concepts such as variables, objects, quantities, operations, processes and assumptions. The purpose of this talk is to present details of the new ontology and to then demonstrate the MCM Tool for several hydrologic models.
An asynchronous traversal engine for graph-based rich metadata management

DOE PAGES

Dai, Dong; Carns, Philip; Ross, Robert B.; ...

2016-06-23

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less

Building an Internet of Samples: The Australian Contribution

NASA Astrophysics Data System (ADS)

Wyborn, Lesley; Klump, Jens; Bastrakova, Irina; Devaraju, Anusuriya; McInnes, Brent; Cox, Simon; Karssies, Linda; Martin, Julia; Ross, Shawn; Morrissey, John; Fraser, Ryan

2017-04-01

Physical samples are often the ground truth to research reported in the scientific literature across multiple domains. They are collected by many different entities (individual researchers, laboratories, government agencies, mining companies, citizens, museums, etc.). Samples must be curated over the long-term to ensure both that their existence is known, and to allow any data derived from them through laboratory and field tests to be linked to the physical samples. For example, having unique identifiers that link back ground truth data on the original sample helps calibrate large volumes of remotely sensed data. Access to catalogues of reliably identified samples from several collections promotes collaboration across all Earth Science disciplines. It also increases the cost effectiveness of research by reducing the need to re-collect samples in the field. The assignment of web identifiers to the digital representations of these physical objects allows us to link to data, literature, investigators and institutions, thus creating an "Internet of Samples". An Australian implementation of the "Internet of Samples" is using the IGSN (International Geo Sample Number, http://igsn.github.io) to identify samples in a globally unique and persistent way. IGSN was developed in the solid earth science community and is recommended for sample identification by the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS). IGSN is interoperable with other persistent identifier systems such as DataCite. Furthermore, the basic IGSN description metadata schema is compatible with existing schemas such as OGC Observations and Measurements (O&M) and DataCite Metadata Schema which makes crosswalks to other metadata schemas easy. IGSN metadata is disseminated through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) allowing it to be aggregated in other applications such as portals (e.g. the Australian IGSN catalogue http://igsn2.csiro.au). The metadata is available in more than one format. The software for IGSN web services is based on components developed for DataCite and adapted to the specific requirements of IGSN. This cooperation in open source development ensures sustainable implementation and faster turnaround times for updates. IGSN, in particular in its Australian implementation, is characterised by a federated approach to system architecture and organisational governance giving it the necessary flexibility to adapt to particular local practices within multiple domains, whilst maintaining an overarching international standard. The three current IGSN allocation agents in Australia: Geoscience Australia, CSIRO and Curtin University, represent different sectors. Through funding from the Australian Research Data Services Program they have combined to develop a common web portal that allows discovery of physical samples and sample collections at a national level.International governance then ensures we can link to an international community but at the same time act locally to ensure the services offered are relevant to the needs of Australian researchers. This flexibility aids the integration of new disciplines into a global community of a physical samples information network.
Mapping and converting essential Federal Geographic Data Committee (FGDC) metadata into MARC21 and Dublin Core: towards an alternative to the FGDC Clearinghouse

USGS Publications Warehouse

Chandler, A.; Foley, D.; Hafez, A.M.

2000-01-01

The purpose of this article is to raise and address a number of issues related to the conversion of Federal Geographic Data Committee metadata into MARC21 and Dublin Core. We present an analysis of 466 FGDC metadata records housed in the National Biological Information Infrastructure (NBII) node of the FGDC Clearinghouse, with special emphasis on the length of fields and the total length of records in this set. One of our contributions is a 34 element crosswalk, a proposal that takes into consideration the constraints of the MARC21 standard as implemented in OCLC's World Cat and the realities of user behavior.
GIS, geostatistics, metadata banking, and tree-based models for data analysis and mapping in environmental monitoring and epidemiology.

PubMed

Schröder, Winfried

2006-05-01

By the example of environmental monitoring, some applications of geographic information systems (GIS), geostatistics, metadata banking, and Classification and Regression Trees (CART) are presented. These tools are recommended for mapping statistically estimated hot spots of vectors and pathogens. GIS were introduced as tools for spatially modelling the real world. The modelling can be done by mapping objects according to the spatial information content of data. Additionally, this can be supported by geostatistical and multivariate statistical modelling. This is demonstrated by the example of modelling marine habitats of benthic communities and of terrestrial ecoregions. Such ecoregionalisations may be used to predict phenomena based on the statistical relation between measurements of an interesting phenomenon such as, e.g., the incidence of medically relevant species and correlated characteristics of the ecoregions. The combination of meteorological data and data on plant phenology can enhance the spatial resolution of the information on climate change. To this end, meteorological and phenological data have to be correlated. To enable this, both data sets which are from disparate monitoring networks have to be spatially connected by means of geostatistical estimation. This is demonstrated by the example of transformation of site-specific data on plant phenology into surface data. The analysis allows for spatial comparison of the phenology during the two periods 1961-1990 and 1991-2002 covering whole Germany. The changes in both plant phenology and air temperature were proved to be statistically significant. Thus, they can be combined by GIS overlay technique to enhance the spatial resolution of the information on the climate change and use them for the prediction of vector incidences at the regional scale. The localisation of such risk hot spots can be done by geometrically merging surface data on promoting factors. This is demonstrated by the example of the transfer of heavy metals through soils. The predicted hot spots of heavy metal transfer can be validated empirically by measurement data which can be inquired by a metadata base linked with a geographic information system. A corresponding strategy for the detection of vector hot spots in medical epidemiology is recommended. Data on incidences and habitats of the Anophelinae in the marsh regions of Lower Saxony (Germany) were used to calculate a habitat model by CART, which together with climate data and data on ecoregions can be further used for the prediction of habitats of medically relevant vector species. In the future, this approach should be supported by an internet-based information system consisting of three components: metadata questionnaire, metadata base, and GIS to link metadata, surface data, and measurement data on incidences and habitats of medically relevant species and related data on climate, phenology, and ecoregional characteristic conditions.
IMS

Atmospheric Science Data Center

2012-11-30

Information Management System An online user interface which provides data and metadata ... on a 24-hour basis; accepts user orders for data; provides information about future data acquision and processing schedules and maintains ...
A Metadata description of the data in "A metabolomic comparison of urinary changes in type 2 diabetes in mouse, rat, and human."

PubMed Central

2011-01-01

Background Metabolomics is a rapidly developing functional genomic tool that has a wide range of applications in diverse fields in biology and medicine. However, unlike transcriptomics and proteomics there is currently no central repository for the depositing of data despite efforts by the Metabolomics Standard Initiative (MSI) to develop a standardised description of a metabolomic experiment. Findings In this manuscript we describe how the MSI description has been applied to a published dataset involving the identification of cross-species metabolic biomarkers associated with type II diabetes. The study describes sample collection of urine from mice, rats and human volunteers, and the subsequent acquisition of data by high resolution 1H NMR spectroscopy. The metadata is described to demonstrate how the MSI descriptions could be applied in a manuscript and the spectra have also been made available for the mouse and rat studies to allow others to process the data. Conclusions The intention of this manuscript is to stimulate discussion as to whether the MSI description is sufficient to describe the metadata associated with metabolomic experiments and encourage others to make their data available to other researchers. PMID:21801423
CHIME: A Metadata-Based Distributed Software Development Environment

DTIC Science & Technology

2005-01-01

structures by using typography , graphics , and animation. The Software Im- mersion in our conceptual model for CHIME can be seen as a form of Software...Even small- to medium-sized development efforts may involve hundreds of artifacts -- design documents, change requests, test cases and results, code...for managing and organizing information from all phases of the software lifecycle. CHIME is designed around an XML-based metadata architecture, in
DoD Net-Centric Services Strategy Implementation in the C2 Domain

DTIC Science & Technology

2010-02-01

those for monolingual thesauri indicated in ANSI/NISO Z39.19-2005 and ISO 2788-1986. Also, the versioning regimen in the KOS must be robust, a...Metadata Registry: Repository of all metadata related to data structures, models, dictionaries , taxonomies, schema, and other engineering artifacts that...access information, schemas, style sheets, controlled vocabularies, dictionaries , and other work products. It would normally be discovered via a
Leveraging Metadata to Create Interactive Images... Today!

NASA Astrophysics Data System (ADS)

Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.

2011-01-01

The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org
The eGenVar data management system—cataloguing and sharing sensitive data and metadata for the life sciences

PubMed Central

Razick, Sabry; Močnik, Rok; Thomas, Laurent F.; Ryeng, Einar; Drabløs, Finn; Sætrom, Pål

2014-01-01

Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data. Database URL: http://bigr.medisin.ntnu.no/data/eGenVar/ PMID:24682735
Study on Information Management for the Conservation of Traditional Chinese Architectural Heritage - 3d Modelling and Metadata Representation

NASA Astrophysics Data System (ADS)

Yen, Y. N.; Weng, K. H.; Huang, H. Y.

2013-07-01

After over 30 years of practise and development, Taiwan's architectural conservation field is moving rapidly into digitalization and its applications. Compared to modern buildings, traditional Chinese architecture has considerably more complex elements and forms. To document and digitize these unique heritages in their conservation lifecycle is a new and important issue. This article takes the caisson ceiling of the Taipei Confucius Temple, octagonal with 333 elements in 8 types, as a case study for digitization practise. The application of metadata representation and 3D modelling are the two key issues to discuss. Both Revit and SketchUp were appliedin this research to compare its effectiveness to metadata representation. Due to limitation of the Revit database, the final 3D models wasbuilt with SketchUp. The research found that, firstly, cultural heritage databasesmustconvey that while many elements are similar in appearance, they are unique in value; although 3D simulations help the general understanding of architectural heritage, software such as Revit and SketchUp, at this stage, could onlybe used tomodel basic visual representations, and is ineffective indocumenting additional critical data ofindividually unique elements. Secondly, when establishing conservation lifecycle information for application in management systems, a full and detailed presentation of the metadata must also be implemented; the existing applications of BIM in managing conservation lifecycles are still insufficient. Results of the research recommends SketchUp as a tool for present modelling needs, and BIM for sharing data between users, but the implementation of metadata representation is of the utmost importance.
Mercury- Distributed Metadata Management, Data Discovery and Access System

NASA Astrophysics Data System (ADS)

Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

2007-12-01

Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata.

PubMed

Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

2018-01-16

Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included.
Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata

PubMed Central

Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

2018-01-01

Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included. PMID:29337877
In Interactive, Web-Based Approach to Metadata Authoring

NASA Technical Reports Server (NTRS)

Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

2001-01-01

NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools
Improving data management and dissemination in web based information systems by semantic enrichment of descriptive data aspects

NASA Astrophysics Data System (ADS)

Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan

2010-10-01

The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases.

PubMed

Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin; Senger, Philipp

2015-01-01

Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article's supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer's disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html. © The Author(s) 2015. Published by Oxford University Press.
NeuroTransDB: highly curated and structured transcriptomic metadata for neurodegenerative diseases

PubMed Central

Bagewadi, Shweta; Adhikari, Subash; Dhrangadhariya, Anjani; Irin, Afroza Khanam; Ebeling, Christian; Namasivayam, Aishwarya Alex; Page, Matthew; Hofmann-Apitius, Martin

2015-01-01

Neurodegenerative diseases are chronic debilitating conditions, characterized by progressive loss of neurons that represent a significant health care burden as the global elderly population continues to grow. Over the past decade, high-throughput technologies such as the Affymetrix GeneChip microarrays have provided new perspectives into the pathomechanisms underlying neurodegeneration. Public transcriptomic data repositories, namely Gene Expression Omnibus and curated ArrayExpress, enable researchers to conduct integrative meta-analysis; increasing the power to detect differentially regulated genes in disease and explore patterns of gene dysregulation across biologically related studies. The reliability of retrospective, large-scale integrative analyses depends on an appropriate combination of related datasets, in turn requiring detailed meta-annotations capturing the experimental setup. In most cases, we observe huge variation in compliance to defined standards for submitted metadata in public databases. Much of the information to complete, or refine meta-annotations are distributed in the associated publications. For example, tissue preparation or comorbidity information is frequently described in an article’s supplementary tables. Several value-added databases have employed additional manual efforts to overcome this limitation. However, none of these databases explicate annotations that distinguish human and animal models in neurodegeneration context. Therefore, adopting a more specific disease focus, in combination with dedicated disease ontologies, will better empower the selection of comparable studies with refined annotations to address the research question at hand. In this article, we describe the detailed development of NeuroTransDB, a manually curated database containing metadata annotations for neurodegenerative studies. The database contains more than 20 dimensions of metadata annotations within 31 mouse, 5 rat and 45 human studies, defined in collaboration with domain disease experts. We elucidate the step-by-step guidelines used to critically prioritize studies from public archives and their metadata curation and discuss the key challenges encountered. Curated metadata for Alzheimer’s disease gene expression studies are available for download. Database URL: www.scai.fraunhofer.de/NeuroTransDB.html PMID:26475471
A Multi-Purpose Data Dissemination Infrastructure for the Marine-Earth Observations

NASA Astrophysics Data System (ADS)

Hanafusa, Y.; Saito, H.; Kayo, M.; Suzuki, H.

2015-12-01

To open the data from a variety of observations, the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) has developed a multi-purpose data dissemination infrastructure. Although many observations have been made in the earth science, all the data are not opened completely. We think data centers may provide researchers with a universal data dissemination service which can handle various kinds of observation data with little effort. For this purpose JAMSTEC Data Management Office has developed the "Information Catalog Infrastructure System (Catalog System)". This is a kind of catalog management system which can create, renew and delete catalogs (= databases) and has following features, - The Catalog System does not depend on data types or granularity of data records. - By registering a new metadata schema to the system, a new database can be created on the same system without sytem modification. - As web pages are defined by the cascading style sheets, databases have different look and feel, and operability. - The Catalog System provides databases with basic search tools; search by text, selection from a category tree, and selection from a time line chart. - For domestic users it creates the Japanese and English pages at the same time and has dictionary to control terminology and proper noun. As of August 2015 JAMSTEC operates 7 databases on the Catalog System. We expect to transfer existing databases to this system, or create new databases on it. In comparison with a dedicated database developed for the specific dataset, the Catalog System is suitable for the dissemination of small datasets, with minimum cost. Metadata held in the catalogs may be transfered to other metadata schema to exchange global databases or portals. Examples: JAMSTEC Data Catalog: http://www.godac.jamstec.go.jp/catalog/data_catalog/metadataList?lang=enJAMSTEC Document Catalog: http://www.godac.jamstec.go.jp/catalog/doc_catalog/metadataList?lang=en&tab=categoryResearch Information and Data Access Site of TEAMS: http://www.i-teams.jp/catalog/rias/metadataList?lang=en&tab=list
GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records.

PubMed

Tahsin, Tasnia; Weissenbacher, Davy; O'Connor, Karen; Magge, Arjun; Scotch, Matthew; Gonzalez-Hernandez, Graciela

2018-05-01

GeoBoost is a command-line software package developed to address sparse or incomplete metadata in GenBank sequence records that relate to the location of the infected host (LOIH) of viruses. Given a set of GenBank accession numbers corresponding to virus GenBank records, GeoBoost extracts, integrates and normalizes geographic information reflecting the LOIH of the viruses using integrated information from GenBank metadata and related full-text publications. In addition, to facilitate probabilistic geospatial modeling, GeoBoost assigns probability scores for each possible LOIH. Binaries and resources required for running GeoBoost are packed into a single zipped file and freely available for download at https://tinyurl.com/geoboost. A video tutorial is included to help users quickly and easily install and run the software. The software is implemented in Java 1.8, and supported on MS Windows and Linux platforms. gragon@upenn.edu. Supplementary data are available at Bioinformatics online.
Documenting Climate Models and Their Simulations

DOE PAGES

Guilyardi, Eric; Balaji, V.; Lawrence, Bryan; ...

2013-05-01

The results of climate models are of increasing and widespread importance. No longer is climate model output of sole interest to climate scientists and researchers in the climate change impacts and adaptation fields. Now nonspecialists such as government officials, policy makers, and the general public all have an increasing need to access climate model output and understand its implications. For this host of users, accurate and complete metadata (i.e., information about how and why the data were produced) is required to document the climate modeling results. We describe a pilot community initiative to collect and make available documentation of climatemore » models and their simulations. In an initial application, a metadata repository is being established to provide information of this kind for a major internationally coordinated modeling activity known as CMIP5 (Coupled Model Intercomparison Project, Phase 5). We expected that for a wide range of stakeholders, this and similar community-managed metadata repositories will spur development of analysis tools that facilitate discovery and exploitation of Earth system simulations.« less

The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

PubMed Central

Reddy, T.B.K.; Thomas, Alex D.; Stamatis, Dimitri; Bertsch, Jon; Isbandi, Michelle; Jansson, Jakob; Mallajosyula, Jyothi; Pagani, Ioanna; Lobos, Elizabeth A.; Kyrpides, Nikos C.

2015-01-01

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Here we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencing projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards. PMID:25348402
Data Management and the National Climate Assessment: Best Practices, Lessons Learned, and Future Applications: A Data Quality Solution

NASA Astrophysics Data System (ADS)

Kunkel, K.; Champion, S.

2015-12-01

Data Management and the National Climate Assessment: A Data Quality Solution Sarah M. Champion and Kenneth E. Kunkel Cooperative Institute for Climate and Satellites, Asheville, NC The Third National Climate Assessment (NCA), anticipated for its authoritative climate change analysis, was also a vanguard in climate communication. From the cutting-edge website to the organization of information, the Assessment content appealed to, and could be accessed by, many demographics. One such pivotal presentation of information in the NCA was the availability of complex metadata directly connected to graphical products. While the basic metadata requirement is federally mandated through a series of federal guidelines as a part of the Information Quality Act, the NCA is also deemed a Highly Influential Scientific Assessment, which requires demonstration of the transparency and reproducibility of the content. To meet these requirements, the Technical Support Unit (TSU) for the NCA embarked on building a system for collecting and presenting metadata that not only met these requirements, but one that has since been employed in support of additional Assessments. The metadata effort for this NCA proved invaluable for many reasons, one of which being that it showcased that there is a critical need for a culture change within the scientific community to support collection and transparency of data and methods to the level produced with the NCA. Irregardless of being federally mandated, it proves to simply be a good practice in science communication. This presentation will detail the collection system built by the TSU, the improvements employed with additional Assessment products, as well as illustrate examples of successful transparency. Through this presentation, we hope to impel the discussion in support of detailed metadata becoming the cultural norm within the scientific community to support influential and highly policy-relevant documents such as the NCA.
Evolving the Living With a Star Data System Definition

NASA Astrophysics Data System (ADS)

Otranto, J. F.; Dijoseph, M.

2003-12-01

NASA's Living With a Star (LWS) Program is a space weather-focused and applications-driven research program. The LWS Program is soliciting input from the solar, space physics, space weather, and climate science communities to develop a system that enables access to science data associated with these disciplines, and advances the development of discipline and interdisciplinary findings. The LWS Program will implement a data system that builds upon the existing and planned data capture, processing, and storage components put in place by individual spacecraft missions and also inter-project data management systems, including active and deep archives, and multi-mission data repositories. It is technically feasible for the LWS Program to integrate data from a broad set of resources, assuming they are either publicly accessible or allow access by permission. The LWS Program data system will work in coordination with spacecraft mission data systems and science data repositories, integrating their holdings using a common metadata representation. This common representation relies on a robust metadata definition that provides journalistic and technical data descriptions, plus linkages to supporting data products and tools. The LWS Program intends to become an enabling resource to PIs, interdisciplinary scientists, researchers, and students facilitating both access to a broad collection of science data, as well as the necessary supporting components to understand and make productive use of these data. For the LWS Program to represent science data that are physically distributed across various ground system elements, information will be collected about these distributed data products through a series of LWS Program-created agents. These agents will be customized to interface or interact with each one of these data systems, collect information, and forward any new metadata records to a LWS Program-developed metadata library. A populated LWS metadata library will function as a single point-of-contact that serves the entire science community as a first stop for data availability, whether or not science data are physically stored in an LWS-operated repository. Further, this metadata library will provide the user access to information for understanding these data including descriptions of the associated spacecraft and instrument, data format, calibration and operations issues, links to ancillary and correlative data products, links to processing tools and models associated with these data, and any corresponding findings produced using these data. The LWS may also support an active archive for solar, space physics, space weather, and climate data when these data would otherwise be discarded or archived off-line. This archive could potentially serve also as a data storage backup facility for LWS missions. The plan for the LWS Program metadata library is developed based upon input received from the solar and geospace science communities; the library's architecture is based on existing systems developed for serving science metadata. The LWS Program continues to seek constructive input from the science community, examples of both successes and failures in dealing with science data systems, and insights regarding the obstacles between the current state-of-the-practice and this vision for the LWS Program metadata library.
A case for user-generated sensor metadata

NASA Astrophysics Data System (ADS)

Nüst, Daniel

2015-04-01

Cheap and easy to use sensing technology and new developments in ICT towards a global network of sensors and actuators promise previously unthought of changes for our understanding of the environment. Large professional as well as amateur sensor networks exist, and they are used for specific yet diverse applications across domains such as hydrology, meteorology or early warning systems. However the impact this "abundance of sensors" had so far is somewhat disappointing. There is a gap between (community-driven) sensor networks that could provide very useful data and the users of the data. In our presentation, we argue this is due to a lack of metadata which allows determining the fitness of use of a dataset. Syntactic or semantic interoperability for sensor webs have made great progress and continue to be an active field of research, yet they often are quite complex, which is of course due to the complexity of the problem at hand. But still, we see the most generic information to determine fitness for use is a dataset's provenance, because it allows users to make up their own minds independently from existing classification schemes for data quality. In this work we will make the case how curated user-contributed metadata has the potential to improve this situation. This especially applies for scenarios in which an observed property is applicable in different domains, and for set-ups where the understanding about metadata concepts and (meta-)data quality differs between data provider and user. On the one hand a citizen does not understand the ISO provenance metadata. On the other hand a researcher might find issues in publicly accessible time series published by citizens, which the latter might not be aware of or care about. Because users will have to determine fitness for use for each application on their own anyway, we suggest an online collaboration platform for user-generated metadata based on an extremely simplified data model. In the most basic fashion, metadata generated by users can be boiled down to a basic property of the world wide web: many information items, such as news or blog posts, allow users to create comments and rate the content. Therefore we argue to focus a core data model on one text field for a textual comment, one optional numerical field for a rating, and a resolvable identifier for the dataset that is commented on. We present a conceptual framework that integrates user comments in existing standards and relevant applications of online sensor networks and discuss possible approaches, such as linked data, brokering, or standalone metadata portals. We relate this framework to existing work in user generated content, such as proprietary rating systems on commercial websites, microformats, the GeoViQua User Quality Model, the CHARMe annotations, or W3C Open Annotation. These systems are also explored for commonalities and based on their very useful concepts and ideas; we present an outline for future extensions of the minimal model. Building on this framework we present a concept how a simplistic comment-rating-system can be extended to capture provenance information for spatio-temporal observations in the sensor web, and how this framework can be evaluated.
NASA Reverb: Standards-Driven Earth Science Data and Service Discovery

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Mitchell, A.; Pilone, D.

2011-12-01

NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard. After a yearlong design, development, and testing process, the ECHO team successfully released "Reverb - The Next Generation Earth Science Discovery Tool." Reverb relies heavily on the information contained in dataset and granule metadata, such as ISO 19115, to provide a dynamic experience to users based on identified search facet values extracted from science metadata. Such an approach allows users to perform cross-dataset correlation and searches, discovering additional data that they may not previously have been aware of. In addition to data discovery, Reverb users may discover services associated with their data of interest. When services utilize supported standards and/or protocols, Reverb can facilitate the invocation of both synchronous and asynchronous data processing services. This greatly enhances a users ability to discover data of interest and accomplish their research goals. Extrapolating on the current movement towards interoperable standards and an increase in available services, data service invocation and chaining will become a natural part of data discovery. Reverb is one example of a discovery tool that provides a mechanism for transforming the earth science data discovery paradigm.
ASIST 2001. Information in a Networked World: Harnessing the Flow. Part III: Poster Presentations.

ERIC Educational Resources Information Center

Proceedings of the ASIST Annual Meeting, 2001

2001-01-01

Topics of Poster Presentations include: electronic preprints; intranets; poster session abstracts; metadata; information retrieval; watermark images; video games; distributed information retrieval; subject domain knowledge; data mining; information theory; course development; historians' use of pictorial images; information retrieval software;…
A multi-service data management platform for scientific oceanographic products

NASA Astrophysics Data System (ADS)

D'Anca, Alessandro; Conte, Laura; Nassisi, Paola; Palazzo, Cosimo; Lecci, Rita; Cretì, Sergio; Mancini, Marco; Nuzzo, Alessandra; Mirto, Maria; Mannarini, Gianandrea; Coppini, Giovanni; Fiore, Sandro; Aloisio, Giovanni

2017-02-01

An efficient, secure and interoperable data platform solution has been developed in the TESSA project to provide fast navigation and access to the data stored in the data archive, as well as a standard-based metadata management support. The platform mainly targets scientific users and the situational sea awareness high-level services such as the decision support systems (DSS). These datasets are accessible through the following three main components: the Data Access Service (DAS), the Metadata Service and the Complex Data Analysis Module (CDAM). The DAS allows access to data stored in the archive by providing interfaces for different protocols and services for downloading, variables selection, data subsetting or map generation. Metadata Service is the heart of the information system of the TESSA products and completes the overall infrastructure for data and metadata management. This component enables data search and discovery and addresses interoperability by exploiting widely adopted standards for geospatial data. Finally, the CDAM represents the back-end of the TESSA DSS by performing on-demand complex data analysis tasks.
Semantics-based distributed I/O with the ParaMEDIC framework.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balaji, P.; Feng, W.; Lin, H.

2008-01-01

Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site oftentimes has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. Thus, we present a framework called 'ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing' which uses application-specific semantic information to convert the generatedmore » data to orders-of-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and re-process the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.« less
Metadata tables to enable dynamic data modeling and web interface design: the SEER example.

PubMed

Weiner, Mark; Sherr, Micah; Cohen, Abigail

2002-04-01

A wealth of information addressing health status, outcomes and resource utilization is compiled and made available by various government agencies. While exploration of the data is possible using existing tools, in general, would-be users of the resources must acquire CD-ROMs or download data from the web, and upload the data into their own database. Where web interfaces exist, they are highly structured, limiting the kinds of queries that can be executed. This work develops a web-based database interface engine whose content and structure is generated through interaction with a metadata table. The result is a dynamically generated web interface that can easily accommodate changes in the underlying data model by altering the metadata table, rather than requiring changes to the interface code. This paper discusses the background and implementation of the metadata table and web-based front end and provides examples of its use with the NCI's Surveillance, Epidemiology and End-Results (SEER) database.
Automated software system for checking the structure and format of ACM SIG documents

NASA Astrophysics Data System (ADS)

Mirza, Arsalan Rahman; Sah, Melike

2017-04-01

Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
Geologic map and digital database of the Romoland 7.5' quadrangle, Riverside County, California

USGS Publications Warehouse

Morton, Douglas M.; Digital preparation by Bovard, Kelly R.; Morton, Gregory

2003-01-01

Portable Document Format (.pdf) files of: This Readme; includes in Appendix I, data contained in rom_met.txt The same graphic as plotted in 2 above. Test plots have not produced precise 1:24,000- scale map sheets. Adobe Acrobat page size setting influences map scale. The Correlation of Map Units and Description of Map Units is in the editorial format of USGS Geologic Investigations Series (I-series) maps but has not been edited to comply with I-map standards. Within the geologic map data package, map units are identified by standard geologic map criteria such as formationname, age, and lithology. Where known, grain size is indicated on the map by a subscripted letter or letters following the unit symbols as follows: lg, large boulders; b, boulder; g, gravel; a, arenaceous; s, silt; c, clay; e.g. Qyfa is a predominantly young alluvial fan deposit that is arenaceous. Multiple letters are used for more specific identification or for mixed units, e.g., Qfysa is a silty sand. In some cases, mixed units are indicated by a compound symbol; e.g., Qyf2sc. Even though this is an Open-File Report and includes the standard USGS Open-File disclaimer, the report closely adheres to the stratigraphic nomenclature of the U.S. Geological Survey. Descriptions of units can be obtained by viewing or plotting the .pdf file (3b above) or plotting the postscript file (2 above). This Readme file describes the digital data, such as types and general contents of files making up the database, and includes information on how to extract and plot the map and accompanying graphic file. Metadata information can be accessed at http://geo-nsdi.er.usgs.gov/metadata/open-file/03-102 and is included in Appendix I of this Readme.
Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

NASA Astrophysics Data System (ADS)

Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

2015-09-01

Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.
Information integration for a sky survey by data warehousing

NASA Astrophysics Data System (ADS)

Luo, A.; Zhang, Y.; Zhao, Y.

The virtualization service of data system for a sky survey LAMOST is very important for astronomers The service needs to integrate information from data collections catalogs and references and support simple federation of a set of distributed files and associated metadata Data warehousing has been in existence for several years and demonstrated superiority over traditional relational database management systems by providing novel indexing schemes that supported efficient on-line analytical processing OLAP of large databases Now relational database systems such as Oracle etc support the warehouse capability which including extensions to the SQL language to support OLAP operations and a number of metadata management tools have been created The information integration of LAMOST by applying data warehousing is to effectively provide data and knowledge on-line
Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.

A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata requestmore » can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.« less
Standards-based curation of a decade-old digital repository dataset of molecular information.

PubMed

Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Murray-Rust, Peter; Rzepa, Henry S; Stewart, James J P

2015-01-01

The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.
EnviroAtlas One Meter Resolution Urban Land Cover Data (2008-2012) Web Service

EPA Pesticide Factsheets

This EnviroAtlas web service supports research and online mapping activities related to EnviroAtlas (https://www.epa.gov/enviroatlas ). The EnviroAtlas One Meter-scale Urban Land Cover (MULC) Data were generated individually for each EnviroAtlas community. Source imagery varies by community. Land cover classes mapped also vary by community and include the following: water, impervious surfaces, soil and barren land, trees, shrub, grass and herbaceous, agriculture, orchards, woody wetlands, and emergent wetlands. Accuracy assessments were completed for each community's classification. For specific information about methods and accuracy of each community's land cover classification, consult their individual metadata records: Austin, TX (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B91A32A9D-96F5-4FA0-BC97-73BAD5D1F158%7D); Cleveland, OH (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B82ab1edf-8fc8-4667-9c52-5a5acffffa34%7D); Des Moines, IA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BA4152198-978D-4C0B-959F-42EABA9C4E1B%7D); Durham, NC (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B2FF66877-A037-4693-9718-D1870AA3F084%7D); Fresno, CA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B87041CF3-05BC-43C3-82DA-F066267C9871%7D); Green Bay, WI (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BD602E7C9-7F53-4C24
The LTER Network Information System: Improving Data Quality and Synthesis through Community Collaboration

NASA Astrophysics Data System (ADS)

Servilla, M.; Brunt, J.

2011-12-01

Emerging in the 1980's as a U.S. National Science Foundation funded research network, the Long Term Ecological Research (LTER) Network began with six sites and with the goal of performing comparative data collection and analysis of major biotic regions of North America. Today, the LTER Network includes 26 sites located in North America, Antarctica, Puerto Rico, and French Polynesia and has contributed a corpus of over 7,000 data sets to the public domain. The diversity of LTER research has led to a wealth of scientific data derived from atmospheric to terrestrial to oceanographic to anthropogenic studies. Such diversity, however, is a contributing factor to data being published with poor or inconsistent quality or to data lacking descriptive documentation sufficient for understanding their origin or performing derivative studies. It is for these reasons that the LTER community, in collaboration with the LTER Network Office, have embarked on the development of the LTER Network Information System (NIS) - an integrative data management approach to improve the process by which quality LTER data and metadata are assembled into a central archive, thereby enabling better discovery, analysis, and synthesis of derived data products. The mission of the LTER NIS is to promote advances in collaborative and synthetic ecological science at multiple temporal and spatial scales by providing the information management and technology infrastructure to increase: ? availability and quality of data from LTER sites - by the use and support of standardized approaches to metadata management and access to data; ? timeliness and number of LTER derived data products - by creating a suite of middleware programs and workflows that make it easy to create and maintain integrated data sets derived from LTER data; and ? knowledge generated from the synthesis of LTER data - by creating standardized access and easy to use applications to discover, access, and use LTER data. The LTER NIS will utilize the Provenance Aware Synthesis Tracking Architecture (PASTA), which will provide the LTER community a metadata-driven data-flow framework to automatically harvest data from LTER research sites and make it available through a well defined software interface. We distinguish PASTA from the more generalized NIS by classifying framework components as critical and enabling cyberinfrastructure that, collectively, provide the services defined by the above mission. Data and metadata will have to pass a set of community defined quality criteria before entry into PASTA, including the use of semantic informing metadata elements and the conformance of data to their structural descriptions provided by metadata. As a result, consumers of data products from PASTA will be assured that metadata are complete and include provenance information where applicable and the data are of the highest quality. Development of the NIS is being performed through community participation. Advisory groups, called "Tiger Teams", are enlisted from the general LTER membership to provide input to the design of the NIS. Other LTER working groups contribute community-based software into the NIS; these include modules for controlled vocabularies, scientific units, and personnel. We anticipate a 2014 release of the LTER NIS.
NASA's Earth Observing System Data and Information System - Many Mechanisms for On-Going Evolution

NASA Astrophysics Data System (ADS)

Ramapriyan, H. K.

2012-12-01

NASA's Earth Observing System Data and Information System has been serving a broad user community since August 1994. As a long-lived multi-mission system serving multiple scientific disciplines and a diverse user community, EOSDIS has been evolving continuously. It has had and continues to have many forms of community input to help with this evolution. Early in its history, it had inputs from the EOSDIS Advisory Panel, benefited from the reviews by various external committees and evolved into the present distributed architecture with discipline-based Distributed Active Archive Centers (DAACs), Science Investigator-led Processing Systems and a cross-DAAC search and data access capability. EOSDIS evolution has been helped by advances in computer technology, moving from an initially planned supercomputing environment to SGI workstations to Linux Clusters for computation and from near-line archives of robotic silos with tape cassettes to RAID-disk-based on-line archives for storage. The network capacities have increased steadily over the years making delivery of data on media almost obsolete. The advances in information systems technologies have been having an even greater impact on the evolution of EOSDIS. In the early days, the advent of the World Wide Web came as a game-changer in the operation of EOSDIS. The metadata model developed for the EOSDIS Core System for representing metadata from EOS standard data products has had an influence on the Federal Geographic Data Committee's metadata content standard and the ISO metadata standards. The influence works both ways. As ISO 19115 metadata standard has developed in recent years, EOSDIS is reviewing its metadata to ensure compliance with the standard. Improvements have been made in the cross-DAAC search and access of data using the centralized metadata clearing house (EOS Clearing House - ECHO) and the client Reverb. Given the diversity of the Earth science disciplines served by the DAACs, the DAACs have developed a number of software tools tailored to their respective user communities. Web services play an important part in improved access to data products including some basic analysis and visualization capabilities. A coherent view into all capabilities available from EOSDIS is evolving through the "Coherent Web" effort. Data are being made available in near real-time for scientific research as well as time-critical applications. On-going community inputs for infusion for maintaining vitality of EOSDIS come from technology developments by NASA-sponsored community data system programs - Advancing Collaborative Connections for Earth System Science (ACCESS), Making Earth System Data Records for Use in Research Environments (MEaSUREs) and Applied Information System Technology (AIST), as well as participation in Earth Science Data System Working Groups, the Earth Science Information Partners Federation and other interagency/international activities. An important source of community needs is the annual American Customer Satisfaction Index survey of EOSDIS users. Some of the key areas in which improvements are required and incremental progress is being made are: ease of discovery and access; cross-organizational interoperability; data inter-use; ease of collaboration; ease of citation of datasets; preservation of provenance and context and making them conveniently available to users.
Persistent identifiers for CMIP6 data in the Earth System Grid Federation

NASA Astrophysics Data System (ADS)

Buurman, Merret; Weigel, Tobias; Juckes, Martin; Lautenschlager, Michael; Kindermann, Stephan

2016-04-01

The Earth System Grid Federation (ESGF) is a distributed data infrastructure that will provide access to the CMIP6 experiment data. The data consist of thousands of datasets composed of millions of files. Over the course of the CMIP6 operational phase, datasets may be retracted and replaced by newer versions that consist of completely or partly new files. Each dataset is hosted at a single data centre, but can have one or several backups (replicas) at other data centres. To keep track of the different data entities and relationships between them, to ensure their consistency and improve exchange of information about them, Persistent Identifiers (PIDs) are used. These are unique identifiers that are registered at a globally accessible server, along with some metadata (the PID record). While usually providing access to the data object they refer to, as long as it exists, the metadata record will remain available even beyond the object's lifetime. Besides providing access to data and metadata, PIDs will allow scientists to communicate effectively and on a fine granularity about CMIP6 data. The initiative to introduce PIDs in the ESGF infrastructure has been described and agreed upon through a series of white papers governed by the WGCM Infrastructure Panel (WIP). In CMIP6, each dataset and each file is assigned a PID that keeps track of the data object's physical copies throughout the object lifetime. In addition to this, its relationship with other data objects is stored in the PID recordA human-readable version of this information is available on an information page also linked in the PID record. A possible application that exploits the information available from the PID records is a smart information tool, which a scientific user can call to find out if his/her version was replaced by a new one, to view and browse the related datasets and files, and to get access to the various copies or to additional metadata on a dedicated website. The PID registration process is embedded in the ESGF data publication process. During their first publication, the PID records are populated with metadata including the parent dataset(s), other existing versions and physical location. Every subsequent publication, un-publication or replica publication of a dataset or file then updates the PID records to keep track of changing physical locations of the data (or lack thereof) and of reported errors in the data. Assembling the metadata records and registering the PIDs on a central server is a potential performance bottleneck as millions of data objects may be published in a short timeframe when the CMIP6 experiment phase begins. For this reason, the PID registration and metadata update tasks are pushed to a message queueing system facilitating high availability and scalability and then processed asynchronously. This will lead to a slight delay in PID registration but will avoid blocking resources at the data centres and slowing down the publication of the data so eagerly awaited by the scientists.
EnviroAtlas Estimated Percent Tree Cover Along Walkable Roads Web Service

EPA Pesticide Factsheets

This EnviroAtlas dataset estimates tree cover along walkable roads. The road width is estimated for each road and percent tree cover is calculated in a 8.5 meter strip beginning at the estimated road edge. Percent tree cover is calculated for each block between road intersections. Tree cover provides valuable benefits to neighborhood residents and walkers by providing shade, improved aesthetics, and outdoor gathering spaces. For specific information about each community's Estimated Percent Tree Cover Along Walkable Roads layer, consult their individual metadata records: Austin, TX (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B4876FD99-C14A-464A-9E31-5CB5F2225687%7D); Cleveland, OH (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B28e3f937-6f22-45c5-98cf-1707b0fc92df%7D); Des Moines, IA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7B09FE7D60-B636-405C-BB07-68147DFE8CAF%7D); Durham, NC (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BF341A26B-4972-4C6B-B675-9B5E02F4F25F%7D); Fresno, CA (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BB71334B9-C53A-4674-A739-1031969E5163%7D); Green Bay, WI (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BB9AFEBED-9C29-4DB0-8B54-0CAF58BE5A2D%7D); Memphis, TN (https://edg.epa.gov/metadata/catalog/search/resource/details.page?uuid=%7BBE552E7A-A789-4AA9-ADF9-234109C6517E%7D); Mi

Predicting age groups of Twitter users based on language and metadata features

PubMed Central

Morgan-Lopez, Antonio A.; Chew, Robert F.; Ruddle, Paul

2017-01-01

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles’ metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen’s d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as “school” for youth and “college” for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research. PMID:28850620
Predicting age groups of Twitter users based on language and metadata features.

PubMed

Morgan-Lopez, Antonio A; Kim, Annice E; Chew, Robert F; Ruddle, Paul

2017-01-01

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for informing public health surveillance and evaluation research.
3D mapping of existing observing capabilities in the frame of GAIA-CLIM H2020 project

NASA Astrophysics Data System (ADS)

Emanuele, Tramutola; Madonna, Fabio; Marco, Rosoldi; Francesco, Amato

2017-04-01

The aim of the Gap Analysis for Integrated Atmospheric ECV CLImate Monitoring (GAIA-CLIM) project is to improve our ability to use ground-based and sub-orbital observations to characterise satellite observations for a number of atmospheric Essential Climate Variables (ECVs). The key outcomes will be a "Virtual Observatory" (VO) facility of co-locations and their uncertainties and a report on gaps in capabilities or understanding, which shall be used to inform subsequent Horizon 2020 activities. In particular, Work Package 1 (WP1) of the GAIA-CLIM project is devoted to the geographical mapping of existing non-satellite measurement capabilities for a number of ECVs in the atmospheric, oceanic and terrestrial domains. The work carried out within WP1 has allowed to provide the users with an up-to-date geographical identification, at the European and global scales, of current surface-based, balloon-based and oceanic (floats) observing capabilities on an ECV by ECV basis for several parameters which can be obtained using space-based observations from past, present and planned satellite missions. Having alighted on a set of metadata schema to follow, a consistent collection of discovery metadata has been provided into a common structure and will be made available to users through the GAIA-CLIM VO in 2018. Metadata can be interactively visualized through a 3D Graphical User Interface. The metadataset includes 54 plausible networks and 2 aircraft permanent infrastructures for EO Characterisation in the context of GAIA-CLIM currently operating on different spatial domains and measuring different ECVs using one or more measurement techniques. Each classified network has in addition been assessed for suitability against metrological criteria to identifyy those with a level of maturity which enables closure on a comparison with satellite measurements. The metadata GUI is based on Cesium, a virtual globe freeware and open source written in Javascript. It allows users to apply different filters to the data displayed on the globe, selecting data per ECV, network, measurements type and level of maturity. Filtering is operated with a query to GeoServer web application through the WFS interface on a data layer configured on our DB Postgres with PostGIS extension; filters set on the GUI are expressed using ECQL (Extended Common Query Language). The GUI allows to visualize in real-time the current non-satellite observing capabilities along with the satellite platforms measuring the same ECVs. Satellite ground track and footprint of the instruments on board can be also visualized. This work contributes to improve metadata and web map services and to facilitate users' experience in the spatio-temporal analysis of Earth Observation data.
SAS- Semantic Annotation Service for Geoscience resources on the web

NASA Astrophysics Data System (ADS)

Elag, M.; Kumar, P.; Marini, L.; Li, R.; Jiang, P.

2015-12-01

There is a growing need for increased integration across the data and model resources that are disseminated on the web to advance their reuse across different earth science applications. Meaningful reuse of resources requires semantic metadata to realize the semantic web vision for allowing pragmatic linkage and integration among resources. Semantic metadata associates standard metadata with resources to turn them into semantically-enabled resources on the web. However, the lack of a common standardized metadata framework as well as the uncoordinated use of metadata fields across different geo-information systems, has led to a situation in which standards and related Standard Names abound. To address this need, we have designed SAS to provide a bridge between the core ontologies required to annotate resources and information systems in order to enable queries and analysis over annotation from a single environment (web). SAS is one of the services that are provided by the Geosematnic framework, which is a decentralized semantic framework to support the integration between models and data and allow semantically heterogeneous to interact with minimum human intervention. Here we present the design of SAS and demonstrate its application for annotating data and models. First we describe how predicates and their attributes are extracted from standards and ingested in the knowledge-base of the Geosemantic framework. Then we illustrate the application of SAS in annotating data managed by SEAD and annotating simulation models that have web interface. SAS is a step in a broader approach to raise the quality of geoscience data and models that are published on the web and allow users to better search, access, and use of the existing resources based on standard vocabularies that are encoded and published using semantic technologies.
SIOExplorer: Modern IT Methods and Tools for Digital Library Management

NASA Astrophysics Data System (ADS)

Sutton, D. W.; Helly, J.; Miller, S.; Chase, A.; Clarck, D.

2003-12-01

With more geoscience disciplines becoming data-driven it is increasingly important to utilize modern techniques for data, information and knowledge management. SIOExplorer is a new digital library project with 2 terabytes of oceanographic data collected over the last 50 years on 700 cruises by the Scripps Institution of Oceanography. It is built using a suite of information technology tools and methods that allow for an efficient and effective digital library management system. The library consists of a number of independent collections, each with corresponding metadata formats. The system architecture allows each collection to be built and uploaded based on a collection dependent metadata template file (MTF). This file is used to create the hierarchical structure of the collection, create metadata tables in a relational database, and to populate object metadata files and the collection as a whole. Collections are comprised of arbitrary digital objects stored at the San Diego Supercomputer Center (SDSC) High Performance Storage System (HPSS) and managed using the Storage Resource Broker (SRB), data handling middle ware developed at SDSC. SIOExplorer interoperates with other collections as a data provider through the Open Archives Initiative (OAI) protocol. The user services for SIOExplorer are accessed from CruiseViewer, a Java application served using Java Web Start from the SIOExplorer home page. CruiseViewer is an advanced tool for data discovery and access. It implements general keyword and interactive geospatial search methods for the collections. It uses a basemap to georeference search results on user selected basemaps such as global topography or crustal age. User services include metadata viewing, opening of selective mime type digital objects (such as images, documents and grid files), and downloading of objects (including the brokering of proprietary hold restrictions).
An Observation Knowledgebase for Hinode Data

NASA Astrophysics Data System (ADS)

Hurlburt, Neal E.; Freeland, S.; Green, S.; Schiff, D.; Seguin, R.; Slater, G.; Cirtain, J.

2007-05-01

We have developed a standards-based system for the Solar Optical and X Ray Telescopes on the Hinode orbiting solar observatory which can serve as part of a developing Heliophysics informatics system. Our goal is to make the scientific data acquired by Hinode more accessible and useful to scientists by allowing them to do reasoning and flexible searches on observation metadata and to ask higher-level questions of the system than previously allowed. The Hinode Observation Knowledgebase relates the intentions and goals of the observation planners (as-planned metadata) with actual observational data (as-run metadata), along with connections to related models, data products and identified features (follow-up metadata) through a citation system. Summaries of the data (both as image thumbnails and short "film strips") serve to guide researchers to the observations appropriate for their research, and these are linked directly to the data catalog for easy extraction and delivery. The semantic information of the observation (Field of view, wavelength, type of observable, average cadence etc.) is captured through simple user interfaces and encoded using the VOEvent XML standard (with the addition of some solar-related extensions). These interfaces merge metadata acquired automatically during both mission planning and an data analysis (see Seguin et. al. 2007 at this meeting) phases with that obtained directly from the planner/analyst and send them to be incorporated into the knowledgebase. The resulting information is automatically rendered into standard categories based on planned and recent observations, as well as by popularity and recommendations by the science team. They are also directly searchable through both and web-based searches and direct calls to the API. Observations details can also be rendered as RSS, iTunes and Google Earth interfaces. The resulting system provides a useful tool to researchers and can act as a demonstration for larger, more complex systems.
Interoperable Solar Data and Metadata via LISIRD 3

NASA Astrophysics Data System (ADS)

Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

2015-12-01

LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.
Metadata Authoring with Versatility and Extensibility

NASA Technical Reports Server (NTRS)

Pollack, Janine; Olsen, Lola

2004-01-01

NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.
A framework for collaborative review of candidate events in high data rate streams: The V-FASTR experiment as a case study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hart, Andrew F.; Cinquini, Luca; Khudikyan, Shakeh E.

2015-01-01

“Fast radio transients” are defined here as bright millisecond pulses of radio-frequency energy. These short-duration pulses can be produced by known objects such as pulsars or potentially by more exotic objects such as evaporating black holes. The identification and verification of such an event would be of great scientific value. This is one major goal of the Very Long Baseline Array (VLBA) Fast Transient Experiment (V-FASTR), a software-based detection system installed at the VLBA. V-FASTR uses a “commensal” (piggy-back) approach, analyzing all array data continually during routine VLBA observations and identifying candidate fast transient events. Raw data can be storedmore » from a buffer memory, which enables a comprehensive off-line analysis. This is invaluable for validating the astrophysical origin of any detection. Candidates discovered by the automatic system must be reviewed each day by analysts to identify any promising signals that warrant a more in-depth investigation. To support the timely analysis of fast transient detection candidates by V-FASTR scientists, we have developed a metadata-driven, collaborative candidate review framework. The framework consists of a software pipeline for metadata processing composed of both open source software components and project-specific code written expressly to extract and catalog metadata from the incoming V-FASTR data products, and a web-based data portal that facilitates browsing and inspection of the available metadata for candidate events extracted from the VLBA radio data.« less
Multilingual Information Discovery and AccesS (MIDAS): A Joint ACM DL'99/ ACM SIGIR'99 Workshop.

ERIC Educational Resources Information Center

Oard, Douglas; Peters, Carol; Ruiz, Miguel; Frederking, Robert; Klavans, Judith; Sheridan, Paraic

1999-01-01

Discusses a multidisciplinary workshop that addressed issues concerning internationally distributed information networks. Highlights include multilingual information access in media other than character-coded text; cross-language information retrieval and multilingual metadata; and evaluation of multilingual systems. (LRW)
The ANSS Station Information System: A Centralized Station Metadata Repository for Populating, Managing and Distributing Seismic Station Metadata

NASA Astrophysics Data System (ADS)

Thomas, V. I.; Yu, E.; Acharya, P.; Jaramillo, J.; Chowdhury, F.

2015-12-01

Maintaining and archiving accurate site metadata is critical for seismic network operations. The Advanced National Seismic System (ANSS) Station Information System (SIS) is a repository of seismic network field equipment, equipment response, and other site information. Currently, there are 187 different sensor models and 114 data-logger models in SIS. SIS has a web-based user interface that allows network operators to enter information about seismic equipment and assign response parameters to it. It allows users to log entries for sites, equipment, and data streams. Users can also track when equipment is installed, updated, and/or removed from sites. When seismic equipment configurations change for a site, SIS computes the overall gain of a data channel by combining the response parameters of the underlying hardware components. Users can then distribute this metadata in standardized formats such as FDSN StationXML or dataless SEED. One powerful advantage of SIS is that existing data in the repository can be leveraged: e.g., new instruments can be assigned response parameters from the Incorporated Research Institutions for Seismology (IRIS) Nominal Response Library (NRL), or from a similar instrument already in the inventory, thereby reducing the amount of time needed to determine parameters when new equipment (or models) are introduced into a network. SIS is also useful for managing field equipment that does not produce seismic data (eg power systems, telemetry devices or GPS receivers) and gives the network operator a comprehensive view of site field work. SIS allows users to generate field logs to document activities and inventory at sites. Thus, operators can also use SIS reporting capabilities to improve planning and maintenance of the network. Queries such as how many sensors of a certain model are installed or what pieces of equipment have active problem reports are just a few examples of the type of information that is available to SIS users.
Abstracts of SIG Sessions.

ERIC Educational Resources Information Center

Proceedings of the ASIS Annual Meeting, 1997

1997-01-01

Presents abstracts of SIG Sessions. Highlights include digital collections; information retrieval methods; public interest/fair use; classification and indexing; electronic publication; funding; globalization; information technology projects; interface design; networking in developing countries; metadata; multilingual databases; networked…
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

NASA Astrophysics Data System (ADS)

Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

2014-12-01

The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
International cooperation between Japanese IUGONET and EU ESPAS projects on development of the metadata database for upper atmospheric study

NASA Astrophysics Data System (ADS)

Yatagai, Akiyo; Ritschel, Bernd; Iyemori, Tomohiko; Koyama, Yukinobu; Hori, Tomoaki; Abe, Shuji; Tanaka, Yoshimasa; Shinbori, Atsuki; UeNo, Satoru; Sato, Yuka; Yagi, Manabu

2013-04-01

The upper atmospheric observational study is the area which an international collaboration is crucially important. The Japanese Inter-university Upper atmosphere Global Observation NETwork project (2009-2014), IUGONET, is an inter-university program by the National Institute of Polar Research (NIPR), Tohoku University, Nagoya University, Kyoto University, and Kyushu University to build a database of metadata for ground-based observations of the upper atmosphere. In order to investigate the mechanism of long-term variations in the upper atmosphere, we need to combine various types of in-situ observations and to accelerate data exchange. The IUGONET institutions have been archiving observed data by radars, magnetometers, photometers, radio telescopes, helioscopes, etc. in various altitude layers from the Earth's surface to the Sun. The IUGONET has been developing systems for searching metadata of these observational data, and the metadata database (MDB) has already been operating since 2011. It adopts DSPACE system for registering metadata, and it uses an extension of the SPASE data model of describing metadata, which is widely used format in the upper atmospheric society including that in USA. The European Union project ESPAS (2011-2015) has the same scientific objects with IUGONET, namely it aims to provide an e-science infrastructure for the retrieval and access to space weather relevant data, information and value added services. It integrates 22 partners in European countries. The ESPAS also plans to adopt SPASE model for defining their metadata, but search system is different. Namely, in spite of the similarity of the data model, basic system ideas and techniques of the system and web portal are different between IUGONET and ESPAS. In order to connect the two systems/databases, we are planning to take an ontological method. The SPASE keyword vocabulary, derived from the SPASE data model shall be used as standard for the description of near-earth and space data content and context. The SPASE keyword vocabulary is modeled as Simple Knowledge Organizing System (SKOS) ontology. The SPASE keyword vocabulary also can be reused in domain-related but also cross-domain projects. The implementation of the vocabulary as ontology enables the direct integration into semantic web based structures and applications, such as linked data and the new Information System and Data Center (ISDC) data management system.
Log-less metadata management on metadata server for parallel file systems.

PubMed

Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.
Log-Less Metadata Management on Metadata Server for Parallel File Systems

PubMed Central

Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093
GeoSearch: A lightweight broking middleware for geospatial resources discovery

NASA Astrophysics Data System (ADS)

Gui, Z.; Yang, C.; Liu, K.; Xia, J.

2012-12-01

With petabytes of geodata, thousands of geospatial web services available over the Internet, it is critical to support geoscience research and applications by finding the best-fit geospatial resources from the massive and heterogeneous resources. Past decades' developments witnessed the operation of many service components to facilitate geospatial resource management and discovery. However, efficient and accurate geospatial resource discovery is still a big challenge due to the following reasons: 1)The entry barriers (also called "learning curves") hinder the usability of discovery services to end users. Different portals and catalogues always adopt various access protocols, metadata formats and GUI styles to organize, present and publish metadata. It is hard for end users to learn all these technical details and differences. 2)The cost for federating heterogeneous services is high. To provide sufficient resources and facilitate data discovery, many registries adopt periodic harvesting mechanism to retrieve metadata from other federated catalogues. These time-consuming processes lead to network and storage burdens, data redundancy, and also the overhead of maintaining data consistency. 3)The heterogeneous semantics issues in data discovery. Since the keyword matching is still the primary search method in many operational discovery services, the search accuracy (precision and recall) is hard to guarantee. Semantic technologies (such as semantic reasoning and similarity evaluation) offer a solution to solve these issues. However, integrating semantic technologies with existing service is challenging due to the expandability limitations on the service frameworks and metadata templates. 4)The capabilities to help users make final selection are inadequate. Most of the existing search portals lack intuitive and diverse information visualization methods and functions (sort, filter) to present, explore and analyze search results. Furthermore, the presentation of the value-added additional information (such as, service quality and user feedback), which conveys important decision supporting information, is missing. To address these issues, we prototyped a distributed search engine, GeoSearch, based on brokering middleware framework to search, integrate and visualize heterogeneous geospatial resources. Specifically, 1) A lightweight discover broker is developed to conduct distributed search. The broker retrieves metadata records for geospatial resources and additional information from dispersed services (portals and catalogues) and other systems on the fly. 2) A quality monitoring and evaluation broker (i.e., QoS Checker) is developed and integrated to provide quality information for geospatial web services. 3) The semantic assisted search and relevance evaluation functions are implemented by loosely interoperating with ESIP Testbed component. 4) Sophisticated information and data visualization functionalities and tools are assembled to improve user experience and assist resource selection.
MOLES Information Model

NASA Astrophysics Data System (ADS)

Ventouras, Spiros; Lawrence, Bryan; Woolf, Andrew; Cox, Simon

2010-05-01

The Metadata Objects for Linking Environmental Sciences (MOLES) model has been developed within the Natural Environment Research Council (NERC) DataGrid project [NERC DataGrid] to fill a missing part of the ‘metadata spectrum'. It is a framework within which to encode the relationships between the tools used to obtain data, the activities which organised their use, and the datasets produced. MOLES is primarily of use to consumers of data, especially in an interdisciplinary context, to allow them to establish details of provenance, and to compare and contrast such information without recourse to discipline-specific metadata or private communications with the original investigators [Lawrence et al 2009]. MOLES is also of use to the custodians of data, providing an organising paradigm for the data and metadata. The work described in this paper is a high-level view of the structure and content of a recent major revision of MOLES (v3.3) carried out as part of a NERC DataGrid extension project. The concepts of MOLES v3.3 are rooted in the harmonised ISO model [Harmonised ISO model] - particularly in metadata standards (ISO 19115, ISO 19115-2) and the ‘Observations and Measurements' conceptual model (ISO 19156). MOLES exploits existing concepts and relationships, and specialises information in these standards. A typical sequence of data capturing involves one or more projects under which a number of activities are undertaken, using appropriate tools and methods to produce the datasets. Following this typical sequence, the relevant metadata can be partitioned into the following main sections - helpful in mapping onto the most suitable standards from the ISO 19100 series. • Project section • Activity section (including both observation acquisition and numerical computation) • Observation section (metadata regarding the methods used to obtained the data, the spatial and temporal sampling regime, quality etc.) • Observation collection section The key concepts in MOLES v3.3 are: a) the result of an observation is defined uniquely from the property (of a feature-of-interest), the sampling-feature (carrying the targeted property values), the procedure used to obtain the result and the time (discrete instant or period) at which the observation takes place. b) an ‘Acquisition' and a ‘Computation' can serve as the basis for describing any observation process chain (procedure). The ‘Acquisition' uses an instrument - sensor or human being - to produce the results and is associated with field trips, flights, cruises etc., whereas the ‘Computation' class involves specific processing steps. A process chain may consist of any combination of ‘Acquisitions' and/or ‘Computations' occurring in parallel or in any order during the data capturing sequence. c) The results can be organised in collections with significantly more flexibility than if one used the original project alone d) the structure of individual observation collections may be domain-specific, in general; however we are investigating the use of CSML (Climate Science Modelling Language) for atmospheric data The model has been tested as a desk exercise by constructing object models for scenarios from various disciplines. References NERC DATAGRID: http://ndg.nerc.ac.uk LAWRENCE ET. AL. ,Information in environmental data grids, Phil. Trans. R. Soc. A, March 2009 vol. 367 no. 1890 1003-1014 ISO HARMONISED MODEL: All relevant ISO standards for geographic metadata from the TC211 series (eg. ISO 19xxx), and is harmonised within a formal UML description in the ‘HollowWorld' packages available at https://www.seegrid.csiro.au/twiki/bin/view/AppSchemas/HollowWorld
Information Architecture for Interactive Archives at the Community Coordianted Modeling Center

NASA Astrophysics Data System (ADS)

De Zeeuw, D.; Wiegand, C.; Kuznetsova, M.; Mullinix, R.; Boblitt, J. M.

2017-12-01

The Community Coordinated Modeling Center (CCMC) is upgrading its meta-data system for model simulations to be compliant with the SPASE meta-data standard. This work is helping to enhance the SPASE standards for simulations to better describe the wide variety of models and their output. It will enable much more sophisticated and automated metrics and validation efforts at the CCMC, as well as much more robust searches for specific types of output. The new meta-data will also allow much more tailored run submissions as it will allow some code options to be selected for Run-On-Request models. We will also demonstrate data accessibility through an implementation of the Heliophysics Application Programmer's Interface (HAPI) protocol of data otherwise available throught the integrated space weather analysis system (iSWA).
Doing One Thing Well: Leveraging Microservices for NASA Earth Science Discovery and Access Across Heterogenous Data Sources

NASA Astrophysics Data System (ADS)

Baynes, K.; Gilman, J.; Pilone, D.; Mitchell, A. E.

2015-12-01

The NASA EOSDIS (Earth Observing System Data and Information System) Common Metadata Repository (CMR) is a continuously evolving metadata system that merges all existing capabilities and metadata from EOS ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) systems. This flagship catalog has been developed with several key requirements: fast search and ingest performance ability to integrate heterogenous external inputs and outputs high availability and resiliency scalability evolvability and expandability This talk will focus on the advantages and potential challenges of tackling these requirements using a microservices architecture, which decomposes system functionality into smaller, loosely-coupled, individually-scalable elements that communicate via well-defined APIs. In addition, time will be spent examining specific elements of the CMR architecture and identifying opportunities for future integrations.

Information Discovery and Retrieval Tools

DTIC Science & Technology

2004-12-01

information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
Information Discovery and Retrieval Tools

DTIC Science & Technology

2003-04-01

information. This session will focus on the various Internet search engines , directories, and how to improve the user experience through the use of...such techniques as metadata, meta- search engines , subject specific search tools, and other developing technologies.
The Arctic Cooperative Data and Information System: Data Management Support for the NSF Arctic Research Program (Invited)

NASA Astrophysics Data System (ADS)

Moore, J.; Serreze, M. C.; Middleton, D.; Ramamurthy, M. K.; Yarmey, L.

2013-12-01

The NSF funds the Advanced Cooperative Arctic Data and Information System (ACADIS), url: (http://www.aoncadis.org/). It serves the growing and increasingly diverse data management needs of NSF's arctic research community. The ACADIS investigator team combines experienced data managers, curators and software engineers from the NSIDC, UCAR and NCAR. ACADIS fosters scientific synthesis and discovery by providing a secure long-term data archive to NSF investigators. The system provides discovery and access to arctic related data from this and other archives. This paper updates the technical components of ACADIS, the implementation of best practices, the value of ACADIS to the community and the major challenges facing this archive for the future in handling the diverse data coming from NSF Arctic investigators. ACADIS provides sustainable data management, data stewardship services and leadership for the NSF Arctic research community through open data sharing, adherence to best practices and standards, capitalizing on appropriate evolving technologies, community support and engagement. ACADIS leverages other pertinent projects, capitalizing on appropriate emerging technologies and participating in emerging cyberinfrastructure initiatives. The key elements of ACADIS user services to the NSF Arctic community include: data and metadata upload; support for datasets with special requirements; metadata and documentation generation; interoperability and initiatives with other archives; and science support to investigators and the community. Providing a self-service data publishing platform requiring minimal curation oversight while maintaining rich metadata for discovery, access and preservation is challenging. Implementing metadata standards are a first step towards consistent content. The ACADIS Gateway and ADE offer users choices for data discovery and access with the clear objective of increasing discovery and use of all Arctic data especially for analysis activities. Metadata is at the core of ACADIS activities, from capturing metadata at the point of data submission to ensuring interoperability , providing data citations, and supporting data discovery. ACADIS metadata efforts include: 1) Evolution of the ACADIS metadata profile to increase flexibility in search; 2) Documentation guidelines; and 3) Metadata standardization efforts. A major activity is now underway to ensure consistency in the metadata profile across all archived datasets. ACADIS is embarking on a critical activity to create Digital Object Identifiers (DOI) for all its holdings. The data services offered by ACADIS focus on meeting the needs of the data providers, providing dynamic search capabilities to peruse the ACADIS and related cyrospheric data repositories, efficient data download and some special services including dataset reformatting and visualization. The service is built around of the following key technical elements: The ACADIS Gateway housed at NCAR has been developed to support NSF Arctic data coming from AON and now broadly across PLR/ARC and related archives: The Arctic Data Explorer (ADE) developed at NSIDC is an integral service of ACADIS bringing the rich archive from NSIDC together with catalogs from ACADIS and international partners in Arctic research: and Rosetta and the Digital Object Identifier (DOI) generation scheme are tools available to the community to help publish and utilize datasets in integration and synthesis and publication.
ASIS '99 Knowledge: Creation, Organization and Use, Part II: SIG Sessions.

ERIC Educational Resources Information Center

Proceedings of the ASIS Annual Meeting, 1999

1999-01-01

Abstracts and descriptions of Special Interest Group (SIG) sessions include such topics as: knowledge management tools, knowledge organization, information retrieval, information seeking behavior, metadata, indexing, library service for distance education, electronic books, future information workforce needs, technological developments, and…
Identifying and naming plant-pathogenic fungi: past, present, and future.

PubMed

Crous, Pedro W; Hawksworth, David L; Wingfield, Michael J

2015-01-01

Scientific names are crucial in communicating knowledge about fungi. In plant pathology, they link information regarding the biology, host range, distribution, and potential risk. Our understanding of fungal biodiversity and fungal systematics has undergone an exponential leap, incorporating genomics, web-based systems, and DNA data for rapid identification to link species to metadata. The impact of our ability to recognize hitherto unknown organisms on plant pathology and trade is enormous and continues to grow. Major challenges for phytomycology are intertwined with the Genera of Fungi project, which adds DNA barcodes to known biodiversity and corrects the application of old, established names via epi- or neotypification. Implementing the one fungus-one name system and linking names to validated type specimens, cultures, and reference sequences will provide the foundation on which the future of plant pathology and the communication of names of plant pathogens will rest.
Preservation of Earth Science Data History with Digital Content Repository Technology

NASA Astrophysics Data System (ADS)

Wei, Y.; Pan, J.; Shrestha, B.; Cook, R. B.

2011-12-01

An increasing need for derived and on-demand data product in Earth Science research makes the digital content more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, this increasing need presents additional challenges in managing data processing history information and delivering such information to end users. For example, the North American Carbon Program (NACP) Multi-scale Synthesis and Terrestrial Model Intercomparison Project (MsTMIP) chose a modified SYNMAP land cover data as one of the input driver data for participating terrestrial biospheric models. The global 1km resolution SYNMAP data was created by harmonizing 3 remote sensing-based land cover products: GLCC, GLC2000, and the MODIS land cover product. The original SYNMAP land cover data was aggregated into half and quarter degree resolution. It was then enhanced with more detailed grassland and cropland types. Currently, there lacks an effective mechanism to convey this data processing information to different modeling teams for them to determine if a data product meets their needs. It still highly relies on offline human interaction. The NASA-sponsored ORNL DAAC has leveraged the contemporary digital object repository technology to promote the representation, management, and delivery of data processing history and provenance information. Within digital object repository, different data products are managed as objects, with metadata as attributes and content delivery and management services as dissemination methods. Derivation relationships among data products can be semantically referenced between digital objects. Within the repository, data users can easily track a derived data product back to its origin, explorer metadata and documents about each intermediate data product, and discover processing details involved in each derivation step. Coupled with Drupal Web Content Management System, the digital repository interface was enhanced to provide intuitive graphic representation of the data processing history. Each data product is also associated with a formal metadata record in FGDC standards, and the main fields of the FGDC record are indexed for search, and are displayed as attributes of the data product. These features enable data users to better understand and consume a data product. The representation of data processing history in digital repository can further promote long-term data preservation. Lineage information is a major aspect to make digital data understandable and usable long time into the future. Derivation references can be setup between digital objects not only within a single digital repository, but also across multiple distributed digital repositories. Along with emerging identification mechanisms, such as Digital Object Identifier (DOI), a flexible distributed digital repository network can be setup to better preserve digital content. In this presentation, we describe how digital content repository technology can be used to manage, preserve, and deliver digital data processing history information in Earth Science research domain, with selected data archived in ORNL DAAC and Model and Synthesis Thematic Data Center (MAST-DC) as testing targets.
Chemical markup, XML, and the World Wide Web. 5. Applications of chemical metadata in RSS aggregators.

PubMed

Murray-Rust, Peter; Rzepa, Henry S; Williamson, Mark J; Willighagen, Egon L

2004-01-01

Examples of the use of the RSS 1.0 (RDF Site Summary) specification together with CML (Chemical Markup Language) to create a metadata based alerting service termed CMLRSS for molecular content are presented. CMLRSS can be viewed either using generic software or with modular opensource chemical viewers and editors enhanced with CMLRSS modules. We discuss the more automated use of CMLRSS as a component of a World Wide Molecular Matrix of semantically rich chemical information.
Standard formatted data units-control authority operations

NASA Technical Reports Server (NTRS)

1991-01-01

The purpose of this document is to illustrate a Control Authority's (CA) possible operation. The document is an interpretation and expansion of the concept found in the CA Procedures Recommendation. The CA is described in terms of the functions it performs for the management and control of data descriptions (metadata). Functions pertaining to the organization of Member Agency Control Authority Offices (MACAOs) (e.g., creating and disbanding) are not discussed. The document also provides an illustrative operational view of a CA through scenarios describing interaction between those roles involved in collecting, controlling, and accessing registered metadata. The roles interacting with the CA are identified by their actions in requesting and responding to requests for metadata, and by the type of information exchanged. The scenarios and examples presented in this document are illustrative only. They represent possible interactions supported by either a manual or automated system. These scenarios identify requirements for an automated system. These requirements are expressed by identifying the information to be exchanged and the services that may be provided by a CA for that exchange.
The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reddy, Tatiparthi B. K.; Thomas, Alex D.; Stamatis, Dimitri

The Genomes OnLine Database (GOLD; http://www.genomesonline.org) is a comprehensive online resource to catalog and monitor genetic studies worldwide. GOLD provides up-to-date status on complete and ongoing sequencing projects along with a broad array of curated metadata. Within this paper, we report version 5 (v.5) of the database. The newly designed database schema and web user interface supports several new features including the implementation of a four level (meta)genome project classification system and a simplified intuitive web interface to access reports and launch search tools. The database currently hosts information for about 19 200 studies, 56 000 Biosamples, 56 000 sequencingmore » projects and 39 400 analysis projects. More than just a catalog of worldwide genome projects, GOLD is a manually curated, quality-controlled metadata warehouse. The problems encountered in integrating disparate and varying quality data into GOLD are briefly highlighted. Lastly, GOLD fully supports and follows the Genomic Standards Consortium (GSC) Minimum Information standards.« less
Digital Libraries and the Problem of Purpose [and] On DigiPaper and the Dissemination of Electronic Documents [and] DFAS: The Distributed Finding Aid Search System [and] Best Practices for Digital Archiving: An Information Life Cycle Approach [and] Mapping and Converting Essential Federal Geographic Data Committee (FGDC) Metadata into MARC21 and Dublin Core: Towards an Alternative to the FGDC Clearinghouse [and] Evaluating Website Modifications at the National Library of Medicine through Search Log analysis.

ERIC Educational Resources Information Center

Levy, David M.; Huttenlocher, Dan; Moll, Angela; Smith, MacKenzie; Hodge, Gail M.; Chandler, Adam; Foley, Dan; Hafez, Alaaeldin M.; Redalen, Aaron; Miller, Naomi

2000-01-01

Includes six articles focusing on the purpose of digital public libraries; encoding electronic documents through compression techniques; a distributed finding aid server; digital archiving practices in the framework of information life cycle management; converting metadata into MARC format and Dublin Core formats; and evaluating Web sites through…
FIR: An Effective Scheme for Extracting Useful Metadata from Social Media.

PubMed

Chen, Long-Sheng; Lin, Zue-Cheng; Chang, Jing-Rong

2015-11-01

Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.
Automatic Conversion of Metadata from the Study of Health in Pomerania to ODM.

PubMed

Hegselmann, Stefan; Gessner, Sophia; Neuhaus, Philipp; Henke, Jörg; Schmidt, Carsten Oliver; Dugas, Martin

2017-01-01

Electronic collection and high quality analysis of medical data is expected to have a big potential to improve patient care and medical research. However, the integration of data from different stake holders is posing a crucial problem. The exchange and reuse of medical data models as well as annotations with unique semantic identifiers were proposed as a solution. Convert metadata from the Study of Health in Pomerania to the standardized CDISC ODM format. The structure of the two data formats is analyzed and a mapping is suggested and implemented. The metadata from the Study of Health in Pomerania was successfully converted to ODM. All relevant information was included in the resulting forms. Three sample forms were evaluated in-depth, which demonstrates the feasibility of this conversion. Hundreds of data entry forms with more than 15.000 items can be converted into a standardized format with some limitations, e.g. regarding logical constraints. This enables the integration of the Study of Health in Pomerania metadata into various systems, facilitating the implementation and reuse in different study sites.
Lightweight Advertising and Scalable Discovery of Services, Datasets, and Events Using Feedcasts

NASA Astrophysics Data System (ADS)

Wilson, B. D.; Ramachandran, R.; Movva, S.

2010-12-01

Broadcast feeds (Atom or RSS) are a mechanism for advertising the existence of new data objects on the web, with metadata and links to further information. Users then subscribe to the feed to receive updates. This concept has already been used to advertise the new granules of science data as they are produced (datacasting), with browse images and metadata, and to advertise bundles of web services (service casting). Structured metadata is introduced into the XML feed format by embedding new XML tags (in defined namespaces), using typed links, and reusing built-in Atom feed elements. This “infocasting” concept can be extended to include many other science artifacts, including data collections, workflow documents, topical geophysical events (hurricanes, forest fires, etc.), natural hazard warnings, and short articles describing a new science result. The common theme is that each infocast contains machine-readable, structured metadata describing the object and enabling further manipulation. For example, service casts contain type links pointing to the service interface description (e.g., WSDL for SOAP services), service endpoint, and human-readable documentation. Our Infocasting project has three main goals: (1) define and evangelize micro-formats (metadata standards) so that providers can easily advertise their web services, datasets, and topical geophysical events by adding structured information to broadcast feeds; (2) develop authoring tools so that anyone can easily author such service advertisements, data casts, and event descriptions; and (3) provide a one-stop, Google-like search box in the browser that allows discovery of service, data and event casts visible on the web, and services & data registered in the GEOSS repository and other NASA repositories (GCMD & ECHO). To demonstrate the event casting idea, a series of micro-articles—with accompanying event casts containing links to relevant datasets, web services, and science analysis workflows--will be authored for several kinds of geophysical events, such as hurricanes, smoke plume events, tsunamis, etc. The talk will describe our progress so far, and some of the issues with leveraging existing metadata standards to define lightweight micro-formats.
Hyper Text Mark-up Language and Dublin Core metadata element set usage in websites of Iranian State Universities' libraries.

PubMed

Zare-Farashbandi, Firoozeh; Ramezan-Shirazi, Mahtab; Ashrafi-Rizi, Hasan; Nouri, Rasool

2014-01-01

Recent progress in providing innovative solutions in the organization of electronic resources and research in this area shows a global trend in the use of new strategies such as metadata to facilitate description, place for, organization and retrieval of resources in the web environment. In this context, library metadata standards have a special place; therefore, the purpose of the present study has been a comparative study on the Central Libraries' Websites of Iran State Universities for Hyper Text Mark-up Language (HTML) and Dublin Core metadata elements usage in 2011. The method of this study is applied-descriptive and data collection tool is the check lists created by the researchers. Statistical community includes 98 websites of the Iranian State Universities of the Ministry of Health and Medical Education and Ministry of Science, Research and Technology and method of sampling is the census. Information was collected through observation and direct visits to websites and data analysis was prepared by Microsoft Excel software, 2011. The results of this study indicate that none of the websites use Dublin Core (DC) metadata and that only a few of them have used overlaps elements between HTML meta tags and Dublin Core (DC) elements. The percentage of overlaps of DC elements centralization in the Ministry of Health were 56% for both description and keywords and, in the Ministry of Science, were 45% for the keywords and 39% for the description. But, HTML meta tags have moderate presence in both Ministries, as the most-used elements were keywords and description (56%) and the least-used elements were date and formatter (0%). It was observed that the Ministry of Health and Ministry of Science follows the same path for using Dublin Core standard on their websites in the future. Because Central Library Websites are an example of scientific web pages, special attention in designing them can help the researchers to achieve faster and more accurate information resources. Therefore, the influence of librarians' ideas on the awareness of web designers and developers will be important for using metadata elements as general, and specifically for applying such standards.
Hyper Text Mark-up Language and Dublin Core metadata element set usage in websites of Iranian State Universities’ libraries

PubMed Central

Zare-Farashbandi, Firoozeh; Ramezan-Shirazi, Mahtab; Ashrafi-Rizi, Hasan; Nouri, Rasool

2014-01-01

Introduction: Recent progress in providing innovative solutions in the organization of electronic resources and research in this area shows a global trend in the use of new strategies such as metadata to facilitate description, place for, organization and retrieval of resources in the web environment. In this context, library metadata standards have a special place; therefore, the purpose of the present study has been a comparative study on the Central Libraries’ Websites of Iran State Universities for Hyper Text Mark-up Language (HTML) and Dublin Core metadata elements usage in 2011. Materials and Methods: The method of this study is applied-descriptive and data collection tool is the check lists created by the researchers. Statistical community includes 98 websites of the Iranian State Universities of the Ministry of Health and Medical Education and Ministry of Science, Research and Technology and method of sampling is the census. Information was collected through observation and direct visits to websites and data analysis was prepared by Microsoft Excel software, 2011. Results: The results of this study indicate that none of the websites use Dublin Core (DC) metadata and that only a few of them have used overlaps elements between HTML meta tags and Dublin Core (DC) elements. The percentage of overlaps of DC elements centralization in the Ministry of Health were 56% for both description and keywords and, in the Ministry of Science, were 45% for the keywords and 39% for the description. But, HTML meta tags have moderate presence in both Ministries, as the most-used elements were keywords and description (56%) and the least-used elements were date and formatter (0%). Conclusion: It was observed that the Ministry of Health and Ministry of Science follows the same path for using Dublin Core standard on their websites in the future. Because Central Library Websites are an example of scientific web pages, special attention in designing them can help the researchers to achieve faster and more accurate information resources. Therefore, the influence of librarians’ ideas on the awareness of web designers and developers will be important for using metadata elements as general, and specifically for applying such standards. PMID:24741646
The PDS4 Data Dictionary Tool - Metadata Design for Data Preparers

NASA Astrophysics Data System (ADS)

Raugh, A.; Hughes, J. S.

2017-12-01

One of the major design goals of the PDS4 development effort was to create an extendable Information Model (IM) for the archive, and to allow mission data designers/preparers to create extensions for metadata definitions specific to their own contexts. This capability is critical for the Planetary Data System - an archive that deals with a data collection that is diverse along virtually every conceivable axis. Amid such diversity in the data itself, it is in the best interests of the PDS archive and its users that all extensions to the IM follow the same design techniques, conventions, and restrictions as the core implementation itself. But it is unrealistic to expect mission data designers to acquire expertise in information modeling, model-driven design, ontology, schema formulation, and PDS4 design conventions and philosophy in order to define their own metadata. To bridge that expertise gap and bring the power of information modeling to the data label designer, the PDS Engineering Node has developed the data dictionary creation tool known as "LDDTool". This tool incorporates the same software used to maintain and extend the core IM, packaged with an interface that enables a developer to create his extension to the IM using the same, standards-based metadata framework PDS itself uses. Through this interface, the novice dictionary developer has immediate access to the common set of data types and unit classes for defining attributes, and a straight-forward method for constructing classes. The more experienced developer, using the same tool, has access to more sophisticated modeling methods like abstraction and extension, and can define context-specific validation rules. We present the key features of the PDS Local Data Dictionary Tool, which both supports the development of extensions to the PDS4 IM, and ensures their compatibility with the IM.
Extended cooperation in clinical studies through exchange of CDISC metadata between different study software solutions.

PubMed

Kuchinke, W; Wiegelmann, S; Verplancke, P; Ohmann, C

2006-01-01

Our objectives were to analyze the possibility of an exchange of an entire clinical study between two different and independent study software solutions. The question addressed was whether a software-independent transfer of study metadata can be performed without programming efforts and with software routinely used for clinical research. Study metadata was transferred with ODM standard (CDISC). Study software systems employed were MACRO (InferMed) and XTrial (XClinical). For the Proof of Concept, a test study was created with MACRO and exported as ODM. For modification and validation of the ODM export file XML-Spy (Altova) and ODM-Checker (XML4Pharma) were used. Through exchange of a complete clinical study between two different study software solutions, a Proof of Concept of the technical feasibility of a system-independent metadata exchange was conducted successfully. The interchange of study metadata between two different systems at different centers was performed with minimal expenditure. A small number of mistakes had to be corrected in order to generate a syntactically correct ODM file and a "vendor extension" had to be inserted. After these modifications, XTrial exhibited the study, including all data fields, correctly. However, the optical appearance of both CRFs (case report forms) was different. ODM can be used as an exchange format for clinical studies between different study software. Thus, new forms of cooperation through exchange of metadata seem possible, for example the joint creation of electronic study protocols or CRFs at different research centers. Although the ODM standard represents a clinical study completely, it contains no information about the representation of data fields in CRFs.
In-field Access to Geoscientific Metadata through GPS-enabled Mobile Phones

NASA Astrophysics Data System (ADS)

Hobona, Gobe; Jackson, Mike; Jordan, Colm; Butchart, Ben

2010-05-01

Fieldwork is an integral part of much geosciences research. But whilst geoscientists have physical or online access to data collections whilst in the laboratory or at base stations, equivalent in-field access is not standard or straightforward. The increasing availability of mobile internet and GPS-supported mobile phones, however, now provides the basis for addressing this issue. The SPACER project was commissioned by the Rapid Innovation initiative of the UK Joint Information Systems Committee (JISC) to explore the potential for GPS-enabled mobile phones to access geoscientific metadata collections. Metadata collections within the geosciences and the wider geospatial domain can be disseminated through web services based on the Catalogue Service for Web(CSW) standard of the Open Geospatial Consortium (OGC) - a global grouping of over 380 private, public and academic organisations aiming to improve interoperability between geospatial technologies. CSW offers an XML-over-HTTP interface for querying and retrieval of geospatial metadata. By default, the metadata returned by CSW is based on the ISO19115 standard and encoded in XML conformant to ISO19139. The SPACER project has created a prototype application that enables mobile phones to send queries to CSW containing user-defined keywords and coordinates acquired from GPS devices built-into the phones. The prototype has been developed using the free and open source Google Android platform. The mobile application offers views for listing titles, presenting multiple metadata elements and a Google Map with an overlay of bounding coordinates of datasets. The presentation will describe the architecture and approach applied in the development of the prototype.
Current Development at the Southern California Earthquake Data Center (SCEDC)

NASA Astrophysics Data System (ADS)

Appel, V. L.; Clayton, R. W.

2005-12-01

Over the past year, the SCEDC completed or is near completion of three featured projects: Station Information System (SIS) Development: The SIS will provide users with an interface into complete and accurate station metadata for all current and historic data at the SCEDC. The goal of this project is to develop a system that can interact with a single database source to enter, update and retrieve station metadata easily and efficiently. The system will provide accurate station/channel information for active stations to the SCSN real-time processing system, as will as station/channel information for stations that have parametric data at the SCEDC i.e., for users retrieving data via STP. Additionally, the SIS will supply information required to generate dataless SEED and COSMOS V0 volumes and allow stations to be added to the system with a minimum, but incomplete set of information using predefined defaults that can be easily updated as more information becomes available. Finally, the system will facilitate statewide metadata exchange for both real-time processing and provide a common approach to CISN historic station metadata. Moment Tensor Solutions: The SCEDC is currently archiving and delivering Moment Magnitudes and Moment Tensor Solutions (MTS) produced by the SCSN in real-time and post-processing solutions for events spanning back to 1999. The automatic MTS runs on all local events with magnitudes > 3.0, and all regional events > 3.5. The distributed solution automatically creates links from all USGS Simpson Maps to a text e-mail summary solution, creates a .gif image of the solution, and updates the moment tensor database tables at the SCEDC. Searchable Scanned Waveforms Site: The Caltech Seismological Lab has made available 12,223 scanned images of pre-digital analog recordings of major earthquakes recorded in Southern California between 1962 and 1992 at http://www.data.scec.org/research/scans/. The SCEDC has developed a searchable web interface that allows users to search the available files, select multiple files for download and then retrieve a zipped file containing the results. Scanned images of paper records for M>3.5 southern California earthquakes and several significant teleseisms are available for download via the SCEDC through this search tool.
Metadata for Web Resources: How Metadata Works on the Web.

ERIC Educational Resources Information Center

Dillon, Martin

This paper discusses bibliographic control of knowledge resources on the World Wide Web. The first section sets the context of the inquiry. The second section covers the following topics related to metadata: (1) definitions of metadata, including metadata as tags and as descriptors; (2) metadata on the Web, including general metadata systems,…

Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

ERIC Educational Resources Information Center

Southwick, Silvia B.; Lampert, Cory

2011-01-01

This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…
A digital repository with an extensible data model for biobanking and genomic analysis management.

PubMed

Izzo, Massimiliano; Mortola, Francesco; Arnulfo, Gabriele; Fato, Marco M; Varesio, Luigi

2014-01-01

Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
A digital repository with an extensible data model for biobanking and genomic analysis management

PubMed Central

2014-01-01

Motivation Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid. PMID:25077808
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

PubMed

Martínez-Romero, Marcos; O'Connor, Martin J; Shankar, Ravi D; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L; Gevaert, Olivier; Graybeal, John; Musen, Mark A

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations

PubMed Central

Martínez-Romero, Marcos; O’Connor, Martin J.; Shankar, Ravi D.; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L.; Gevaert, Olivier; Graybeal, John; Musen, Mark A.

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository. PMID:29854196
Exploiting Untapped Information Resources in Earth Science

NASA Astrophysics Data System (ADS)

Ramachandran, R.; Fox, P. A.; Kempler, S.; Maskey, M.

2015-12-01

One of the continuing challenges in any Earth science investigation is the amount of time and effort required for data preparation before analysis can begin. Current Earth science data and information systems have their own shortcomings. For example, the current data search systems are designed with the assumption that researchers find data primarily by metadata searches on instrument or geophysical keywords, assuming that users have sufficient knowledge of the domain vocabulary to be able to effectively utilize the search catalogs. These systems lack support for new or interdisciplinary researchers who may be unfamiliar with the domain vocabulary or the breadth of relevant data available. There is clearly a need to innovate and evolve current data and information systems in order to improve data discovery and exploration capabilities to substantially reduce the data preparation time and effort. We assert that Earth science metadata assets are dark resources, information resources that organizations collect, process, and store for regular business or operational activities but fail to utilize for other purposes. The challenge for any organization is to recognize, identify and effectively utilize the dark data stores in their institutional repositories to better serve their stakeholders. NASA Earth science metadata catalogs contain dark resources consisting of structured information, free form descriptions of data and pre-generated images. With the addition of emerging semantic technologies, such catalogs can be fully utilized beyond their original design intent of supporting current search functionality. In this presentation, we will describe our approach of exploiting these information resources to provide novel data discovery and exploration pathways to science and education communities
A conceptual model of the automated credibility assessment of the volunteered geographic information

NASA Astrophysics Data System (ADS)

Idris, N. H.; Jackson, M. J.; Ishak, M. H. I.

2014-02-01

The use of Volunteered Geographic Information (VGI) in collecting, sharing and disseminating geospatially referenced information on the Web is increasingly common. The potentials of this localized and collective information have been seen to complement the maintenance process of authoritative mapping data sources and in realizing the development of Digital Earth. The main barrier to the use of this data in supporting this bottom up approach is the credibility (trust), completeness, accuracy, and quality of both the data input and outputs generated. The only feasible approach to assess these data is by relying on an automated process. This paper describes a conceptual model of indicators (parameters) and practical approaches to automated assess the credibility of information contributed through the VGI including map mashups, Geo Web and crowd - sourced based applications. There are two main components proposed to be assessed in the conceptual model - metadata and data. The metadata component comprises the indicator of the hosting (websites) and the sources of data / information. The data component comprises the indicators to assess absolute and relative data positioning, attribute, thematic, temporal and geometric correctness and consistency. This paper suggests approaches to assess the components. To assess the metadata component, automated text categorization using supervised machine learning is proposed. To assess the correctness and consistency in the data component, we suggest a matching validation approach using the current emerging technologies from Linked Data infrastructures and using third party reviews validation. This study contributes to the research domain that focuses on the credibility, trust and quality issues of data contributed by web citizen providers.
Standardized Metadata for Human Pathogen/Vector Genomic Sequences

PubMed Central

Dugan, Vivien G.; Emrich, Scott J.; Giraldo-Calderón, Gloria I.; Harb, Omar S.; Newman, Ruchi M.; Pickett, Brett E.; Schriml, Lynn M.; Stockwell, Timothy B.; Stoeckert, Christian J.; Sullivan, Dan E.; Singh, Indresh; Ward, Doyle V.; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M.; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H.; Cuomo, Christina A.; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W. Florian; Giovanni, Maria; Henn, Matthew R.; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C.; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F.; Murphy, Cheryl I.; Myers, Garry; Neafsey, Daniel E.; Nelson, Karen E.; Nierman, William C.; Puzak, Julia; Rasko, David; Roos, David S.; Sadzewicz, Lisa; Silva, Joana C.; Sobral, Bruno; Squires, R. Burke; Stevens, Rick L.; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H.

2014-01-01

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium’s minimal information (MIxS) and NCBI’s BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant. PMID:24936976
Overview of long-term field experiments in Germany - metadata visualization

NASA Astrophysics Data System (ADS)

Muqit Zoarder, Md Abdul; Heinrich, Uwe; Svoboda, Nikolai; Grosse, Meike; Hierold, Wilfried

2017-04-01

BonaRes ("soil as a sustainable resource for the bioeconomy") is conducting to collect data and metadata of agricultural long-term field experiments (LTFE) of Germany. It is funded by the German Federal Ministry of Education and Research (BMBF) under the umbrella of the National Research Strategy BioEconomy 2030. BonaRes consists of ten interdisciplinary research project consortia and the 'BonaRes - Centre for Soil Research'. BonaRes Data Centre is responsible for collecting all LTFE data and regarding metadata into an enterprise database upon higher level of security and visualization of the data and metadata through data portal. In the frame of the BonaRes project, we are compiling an overview of long-term field experiments in Germany that is based on a literature review, the results of the online survey and direct contacts with LTFE operators. Information about research topic, contact person, website, experiment setup and analyzed parameters are collected. Based on the collected LTFE data, an enterprise geodatabase is developed and a GIS-based web-information system about LTFE in Germany is also settled. Various aspects of the LTFE, like experiment type, land-use type, agricultural category and duration of experiment, are presented in thematic maps. This information system is dynamically linked to the database, which means changes in the data directly affect the presentation. An easy data searching option using LTFE name, -location or -operators and the dynamic layer selection ensure a user-friendly web application. Dispersion and visualization of the overlapping LTFE points on the overview map are also challenging and we make it automatized at very zoom level which is also a consistent part of this application. The application provides both, spatial location and meta-information of LTFEs, which is backed-up by an enterprise geodatabase, GIS server for hosting map services and Java script API for web application development.
Standardized metadata for human pathogen/vector genomic sequences.

PubMed

Dugan, Vivien G; Emrich, Scott J; Giraldo-Calderón, Gloria I; Harb, Omar S; Newman, Ruchi M; Pickett, Brett E; Schriml, Lynn M; Stockwell, Timothy B; Stoeckert, Christian J; Sullivan, Dan E; Singh, Indresh; Ward, Doyle V; Yao, Alison; Zheng, Jie; Barrett, Tanya; Birren, Bruce; Brinkac, Lauren; Bruno, Vincent M; Caler, Elizabet; Chapman, Sinéad; Collins, Frank H; Cuomo, Christina A; Di Francesco, Valentina; Durkin, Scott; Eppinger, Mark; Feldgarden, Michael; Fraser, Claire; Fricke, W Florian; Giovanni, Maria; Henn, Matthew R; Hine, Erin; Hotopp, Julie Dunning; Karsch-Mizrachi, Ilene; Kissinger, Jessica C; Lee, Eun Mi; Mathur, Punam; Mongodin, Emmanuel F; Murphy, Cheryl I; Myers, Garry; Neafsey, Daniel E; Nelson, Karen E; Nierman, William C; Puzak, Julia; Rasko, David; Roos, David S; Sadzewicz, Lisa; Silva, Joana C; Sobral, Bruno; Squires, R Burke; Stevens, Rick L; Tallon, Luke; Tettelin, Herve; Wentworth, David; White, Owen; Will, Rebecca; Wortman, Jennifer; Zhang, Yun; Scheuermann, Richard H

2014-01-01

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.
The impact of lidar elevation uncertainty on mapping intertidal habitats on barrier islands

USGS Publications Warehouse

Enwright, Nicholas M.; Wang, Lei; Borchert, Sinéad M.; Day, Richard H.; Feher, Laura C.; Osland, Michael J.

2018-01-01

While airborne lidar data have revolutionized the spatial resolution that elevations can be realized, data limitations are often magnified in coastal settings. Researchers have found that airborne lidar can have a vertical error as high as 60 cm in densely vegetated intertidal areas. The uncertainty of digital elevation models is often left unaddressed; however, in low-relief environments, such as barrier islands, centimeter differences in elevation can affect exposure to physically demanding abiotic conditions, which greatly influence ecosystem structure and function. In this study, we used airborne lidar elevation data, in situ elevation observations, lidar metadata, and tide gauge information to delineate low-lying lands and the intertidal wetlands on Dauphin Island, a barrier island along the coast of Alabama, USA. We compared three different elevation error treatments, which included leaving error untreated and treatments that used Monte Carlo simulations to incorporate elevation vertical uncertainty using general information from lidar metadata and site-specific Real-Time Kinematic Global Position System data, respectively. To aid researchers in instances where limited information is available for error propagation, we conducted a sensitivity test to assess the effect of minor changes to error and bias. Treatment of error with site-specific observations produced the fewest omission errors, although the treatment using the lidar metadata had the most well-balanced results. The percent coverage of intertidal wetlands was increased by up to 80% when treating the vertical error of the digital elevation models. Based on the results from the sensitivity analysis, it could be reasonable to use error and positive bias values from literature for similar environments, conditions, and lidar acquisition characteristics in the event that collection of site-specific data is not feasible and information in the lidar metadata is insufficient. The methodology presented in this study should increase efficiency and enhance results for habitat mapping and analyses in dynamic, low-relief coastal environments.
Event selection services in ATLAS

NASA Astrophysics Data System (ADS)

Cranshaw, J.; Cuhadar-Donszelmann, T.; Gallas, E.; Hrivnac, J.; Kenyon, M.; McGlone, H.; Malon, D.; Mambelli, M.; Nowak, M.; Viegas, F.; Vinek, E.; Zhang, Q.

2010-04-01

ATLAS has developed and deployed event-level selection services based upon event metadata records ("TAGS") and supporting file and database technology. These services allow physicists to extract events that satisfy their selection predicates from any stage of data processing and use them as input to later analyses. One component of these services is a web-based Event-Level Selection Service Interface (ELSSI). ELSSI supports event selection by integrating run-level metadata, luminosity-block-level metadata (e.g., detector status and quality information), and event-by-event information (e.g., triggers passed and physics content). The list of events that survive after some selection criterion is returned in a form that can be used directly as input to local or distributed analysis; indeed, it is possible to submit a skimming job directly from the ELSSI interface using grid proxy credential delegation. ELSSI allows physicists to explore ATLAS event metadata as a means to understand, qualitatively and quantitatively, the distributional characteristics of ATLAS data. In fact, the ELSSI service provides an easy interface to see the highest missing ET events or the events with the most leptons, to count how many events passed a given set of triggers, or to find events that failed a given trigger but nonetheless look relevant to an analysis based upon the results of offline reconstruction, and more. This work provides an overview of ATLAS event-level selection services, with an emphasis upon the interactive Event-Level Selection Service Interface.
Introducing a Web API for Dataset Submission into a NASA Earth Science Data Center

NASA Astrophysics Data System (ADS)

Moroni, D. F.; Quach, N.; Francis-Curley, W.

2016-12-01

As the landscape of data becomes increasingly more diverse in the domain of Earth Science, the challenges of managing and preserving data become more onerous and complex, particularly for data centers on fixed budgets and limited staff. Many solutions already exist to ease the cost burden for the downstream component of the data lifecycle, yet most archive centers are still racing to keep up with the influx of new data that still needs to find a quasi-permanent resting place. For instance, having well-defined metadata that is consistent across the entire data landscape provides for well-managed and preserved datasets throughout the latter end of the data lifecycle. Translators between different metadata dialects are already in operational use, and facilitate keeping older datasets relevant in today's world of rapidly evolving metadata standards. However, very little is done to address the first phase of the lifecycle, which deals with the entry of both data and the corresponding metadata into a system that is traditionally opaque and closed off to external data producers, thus resulting in a significant bottleneck to the dataset submission process. The ATRAC system was the NOAA NCEI's answer to this previously obfuscated barrier to scientists wishing to find a home for their climate data records, providing a web-based entry point to submit timely and accurate metadata and information about a very specific dataset. A couple of NASA's Distributed Active Archive Centers (DAACs) have implemented their own versions of a web-based dataset and metadata submission form including the ASDC and the ORNL DAAC. The Physical Oceanography DAAC is the most recent in the list of NASA-operated DAACs who have begun to offer their own web-based dataset and metadata submission services to data producers. What makes the PO.DAAC dataset and metadata submission service stand out from these pre-existing services is the option of utilizing both a web browser GUI and a RESTful API to facilitate rapid and efficient updating of dataset metadata records by external data producers. Here we present this new service and demonstrate the variety of ways in which a multitude of Earth Science datasets may be submitted in a manner that significantly reduces the time in ensuring that new, vital data reaches the public domain.
OSCAR/Surface: Metadata for the WMO Integrated Observing System WIGOS

NASA Astrophysics Data System (ADS)

Klausen, Jörg; Pröscholdt, Timo; Mannes, Jürg; Cappelletti, Lucia; Grüter, Estelle; Calpini, Bertrand; Zhang, Wenjian

2016-04-01

The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority underpinning all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). It does this by better integrating WMO and co-sponsored observing systems, as well as partner networks. For this, an important aspect is the description of the observational capabilities by way of structured metadata. The 17th Congress of the Word Meteorological Organization (Cg-17) has endorsed the semantic WIGOS metadata standard (WMDS) developed by the Task Team on WIGOS Metadata (TT-WMD). The standard comprises of a set of metadata classes that are considered to be of critical importance for the interpretation of observations and the evolution of observing systems relevant to WIGOS. The WMDS serves all recognized WMO Application Areas, and its use for all internationally exchanged observational data generated by WMO Members is mandatory. The standard will be introduced in three phases between 2016 and 2020. The Observing Systems Capability Analysis and Review (OSCAR) platform operated by MeteoSwiss on behalf of WMO is the official repository of WIGOS metadata and an implementation of the WMDS. OSCAR/Surface deals with all surface-based observations from land, air and oceans, combining metadata managed by a number of complementary, more domain-specific systems (e.g., GAWSIS for the Global Atmosphere Watch, JCOMMOPS for the marine domain, the WMO Radar database). It is a modern, web-based client-server application with extended information search, filtering and mapping capabilities including a fully developed management console to add and edit observational metadata. In addition, a powerful application programming interface (API) is being developed to allow machine-to-machine metadata exchange. The API is based on an ISO/OGC-compliant XML schema for the WMDS using the Observations and Measurements (ISO19156) conceptual model. The purpose of the presentation is to acquaint the audience with OSCAR, the WMDS and the current XML schema; and, to explore the relationship to the INSPIRE XML schema. Feedback from experts in the various disciplines of meteorology, climatology, atmospheric chemistry, hydrology on the utility of the new standard and the XML schema will be solicited and will guide WMO in further evolving the WMDS.
Mining dark information resources to develop new informatics capabilities to support science

NASA Astrophysics Data System (ADS)

Ramachandran, Rahul; Maskey, Manil; Bugbee, Kaylin

2016-04-01

Dark information resources are digital resources that organizations collect, process, and store for regular business or operational activities but fail to realize their potential for other purposes. The challenge for any organization is to recognize, identify and effectively exploit these dark information stores. Metadata catalogs at different data centers store dark information resources consisting of structured information, free form descriptions of data and browse images. These information resources are never fully exploited beyond a few fields used for search and discovery. For example, the NASA Earth science catalog holds greater than 6000 data collections, 127 million records for individual files and 67 million browse images. We believe that the information contained in the metadata catalogs and the browse images can be utilized beyond their original design intent to provide new data discovery and exploration pathways to support science and education communities. In this paper we present two research applications using information stored in the metadata catalog in a completely novel way. The first application is designing a data curation service. The objective of the data curation service is to augment the existing data search capabilities. Given a specific atmospheric phenomenon, the data curation service returns the user a ranked list of relevant data sets. Different fields in the metadata records including textual descriptions are mined. A specialized relevancy ranking algorithm has been developed that uses a "bag of words" to define phenomena along with an ensemble of known approaches such as the Jaccard Coefficient, Cosine Similarity and Zone ranking to rank the data sets. This approach is also extended to map from the data set level to data file variable level. The second application is focused on providing a service where a user can search and discover browse images containing specific phenomena from the vast catalog. This service will aid researchers in uncovering interesting event in the data for case study analysis. The challenge of this second application is to bridge the semantic gap between the low level image pixel values and the semantic concept perceived by a user when he or she sees an image. A deep learning algorithm, specifically the Convolution Neural Network (CNN), has been trained and tested to identify three types of Earth science phenomena - Hurricanes, Dust, and Smoke/Haze in MODIS imagery. Latest results from both the applications will be presented in this paper.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services

NASA Astrophysics Data System (ADS)

Elger, K.; Ulbricht, D.; Bertelmann, R.

2016-12-01

GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo-Inf. 2016, 5, 25. http://doi.org/10.3390/ijgi5030025[2] https://github.com/datacite[3] https://github.com/ulbricht/search/tree/doidb , https://github.com/ulbricht/mds/tree/doidb , https://github.com/ulbricht/oaip/tree/doidb[4] http://doidb.wdc-terra.org
Enriched Video Semantic Metadata: Authorization, Integration, and Presentation.

ERIC Educational Resources Information Center

Mu, Xiangming; Marchionini, Gary

2003-01-01

Presents an enriched video metadata framework including video authorization using the Video Annotation and Summarization Tool (VAST)-a video metadata authorization system that integrates both semantic and visual metadata-- metadata integration, and user level applications. Results demonstrated that the enriched metadata were seamlessly…
Interoperability Across the Stewardship Spectrum in the DataONE Repository Federation

NASA Astrophysics Data System (ADS)

Jones, M. B.; Vieglais, D.; Wilson, B. E.

2016-12-01

Thousands of earth and environmental science repositories serve many researchers and communities, each with their own community and legal mandates, sustainability models, and historical infrastructure. These repositories span the stewardship spectrum from highly curated collections that employ large numbers of staff members to review and improve data, to small, minimal budget repositories that accept data caveat emptor and where all responsibility for quality lies with the submitter. Each repository fills a niche, providing services that meet the stewardship tradeoffs of one or more communities. We have reviewed these stewardship tradeoffs for several DataONE member repositories ranging from minimally (KNB) to highly curated (Arctic Data Center), as well as general purpose (Dryad) to highly discipline or project specific (NEON). The rationale behind different levels of stewardship reflect resolution of these tradeoffs. Some repositories aim to encourage extensive uptake by keeping processes simple and minimizing the amount of information collected, but this limits the long-term utility of the data and the search, discovery, and integration systems that are possible. Other repositories require extensive metadata input, review, and assessment, allowing for excellent preservation, discovery, and integration but at the cost of significant time for submitters and expense for curatorial staff. DataONE recognizes these different levels of curation, and attempts to embrace them to create a federation that is useful across the stewardship spectrum. DataONE provides a tiered model for repositories with growing utility of DataONE services at higher tiers of curation. The lowest tier supports read-only access to data and requires little more than title and contact metadata. Repositories can gradually phase in support for higher levels of metadata and services as needed. These tiered capabilities are possible through flexible support for multiple metadata standards and services, where repositories can incrementally increase their requirements as they want to satisfy more use cases. Within DataONE, metadata search services support minimal metadata models, but significantly expanded precision and recall become possible when repositories provide more extensively curated metadata.
NCI's national environmental research data collection: metadata management built on standards and preparing for the semantic web

NASA Astrophysics Data System (ADS)

Wang, Jingbo; Bastrakova, Irina; Evans, Ben; Gohar, Kashif; Santana, Fabiana; Wyborn, Lesley

2015-04-01

National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. We manage 40+ data collections using NCI's Data Management Plan (DMP), which is compatible with the ISO 19100 metadata standards. We utilize ISO standards to make sure our metadata is transferable and interoperable for sharing and harvesting. The DMP is used along with metadata from the data itself, to create a hierarchy of data collection, dataset and time series catalogues that is then exposed through GeoNetwork for standard discoverability. This hierarchy catalogues are linked using a parent-child relationship. The hierarchical infrastructure of our GeoNetwork catalogues system aims to address both discoverability and in-house administrative use-cases. At NCI, we are currently improving the metadata interoperability in our catalogue by linking with standardized community vocabulary services. These emerging vocabulary services are being established to help harmonise data from different national and international scientific communities. One such vocabulary service is currently being established by the Australian National Data Services (ANDS). Data citation is another important aspect of the NCI data infrastructure, which allows tracking of data usage and infrastructure investment, encourage data sharing, and increasing trust in research that is reliant on these data collections. We incorporate the standard vocabularies into the data citation metadata so that the data citation become machine readable and semantically friendly for web-search purpose as well. By standardizing our metadata structure across our entire data corpus, we are laying the foundation to enable the application of appropriate semantic mechanisms to enhance discovery and analysis of NCI's national environmental research data information. We expect that this will further increase the data discoverability and encourage the data sharing and reuse within the community, increasing the value of the data much further than its current use.
New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and UrbIS

NASA Astrophysics Data System (ADS)

Crow, M. C.; Devarakonda, R.; Hook, L.; Killeffer, T.; Krassovski, M.; Boden, T.; King, A. W.; Wullschleger, S. D.

2016-12-01

Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This discussion describes tools being used in two different projects at Oak Ridge National Laboratory (ORNL), but at different stages of the data lifecycle. The Metadata Entry and Data Search Tool is being used for the documentation, archival, and data discovery stages for the Next Generation Ecosystem Experiment - Arctic (NGEE Arctic) project while the Urban Information Systems (UrbIS) Data Catalog is being used to support indexing, cataloging, and searching. The NGEE Arctic Online Metadata Entry Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The UrbIS Data Catalog is a data discovery tool supported by the Mercury cataloging framework [2] which aims to compile urban environmental data from around the world into one location, and be searchable via a user-friendly interface. Each data record conveniently displays its title, source, and date range, and features: (1) a button for a quick view of the metadata, (2) a direct link to the data and, for some data sets, (3) a button for visualizing the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for searching by area. References: [1] Devarakonda, Ranjeet, et al. "Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example." Big Data (Big Data), 2015 IEEE International Conference on. IEEE, 2015. [2] Devarakonda, R., Palanisamy, G., Wilson, B. E., & Green, J. M. (2010). Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics, 3(1-2), 87-94.

Adapting the CUAHSI Hydrologic Information System to OGC standards

NASA Astrophysics Data System (ADS)

Valentine, D. W.; Whitenack, T.; Zaslavsky, I.

2010-12-01

The CUAHSI Hydrologic Information System (HIS) provides web and desktop client access to hydrologic observations via water data web services using an XML schema called “WaterML”. The WaterML 1.x specification and the corresponding Water Data Services have been the backbone of the HIS service-oriented architecture (SOA) and have been adopted for serving hydrologic data by several federal agencies and many academic groups. The central discovery service, HIS Central, is based on an metadata catalog that references 4.7 billion observations, organized as 23 million data series from 1.5 million sites from 51 organizations. Observations data are published using HydroServer nodes that have been deployed at 18 organizations. Usage of HIS has increased by 8x from 2008 to 2010, and doubled in usage from 1600 data series a day in 2009 to 3600 data series a day in the first half of 2010. The HIS central metadata catalog currently harvests information from 56 Water Data Services. We collaborate on the catalog updates with two federal partners, USGS and US EPA: their data series are periodically reloaded into the HIS metadata catalog. We are pursuing two main development directions in the HIS project: Cloud-based computing, and further compliance with Open Geospatial Consortium (OGC) standards. The goal of moving to cloud-computing is to provide a scalable collaborative system with a simpler deployment and less dependence of hardware maintenance and staff. This move requires re-architecting the information models underlying the metadata catalog, and Water Data Services to be independent of the underlying relational database model, allowing for implementation on both relational databases, and cloud-based processing systems. Cloud-based HIS central resources can be managed collaboratively; partners share responsibility for their metadata by publishing data series information into the centralized catalog. Publishing data series will use REST-based service interfaces, like OData, as the basis for ingesting data series information into a cloud-hosted catalog. The future HIS services involve providing information via OGC Standards that will allow for observational data access from commercial GIS applications. Use of standards will allow for tools to access observational data from other projects using standards, such as the Ocean Observatories Initiative, and for tools from such projects to be integrated into the HIS toolset. With international collaborators, we have been developing a water information exchange language called “WaterML 2.0” which will be used to deliver observations data over OGC Sensor Observation Services (SOS). A software stack of OGC standard services will provide access to HIS information. In addition to SOS, Web Mapping and Feature Services (WMS, and WFS) will provide access to location information. Catalog Services for the Web (CSW) will provide a catalog for water information that is both centralized, and distributed. We intend the OGC standards supplement the existing HIS service interfaces, rather than replace the present service interfaces. The ultimate goal of this development is expand access to hydrologic observations data, and create an environment where these data can be seamlessly integrated with standards-compliant data resources.
Learning Objects Metadata and Tools in the Area of Operations Research.

ERIC Educational Resources Information Center

Kassanke, Stephan; El-Saddik, Abdulmotaleb; Steinacker, Achim

Information technology and the Internet are making inroads into almost all areas of society. The requirements of students and professionals are fast changing, and the information society requires lifelong learning in practically all areas, especially those related to information technologies. The educational sector can profit in particular from…
The geochemical landscape of northwestern Wisconsin and adjacent parts of northern Michigan and Minnesota (geochemical data files)

USGS Publications Warehouse

Cannon, William F.; Woodruff, Laurel G.

2003-01-01

This data set consists of nine files of geochemical information on various types of surficial deposits in northwestern Wisconsin and immediately adjacent parts of Michigan and Minnesota. The files are presented in two formats: as dbase files in dbaseIV form and Microsoft Excel form. The data present multi-element chemical analyses of soils, stream sediments, and lake sediments. Latitude and longitude values are provided in each file so that the dbf files can be readily imported to GIS applications. Metadata files are provided in outline form, question and answer form and text form. The metadata includes information on procedures for sample collection, sample preparation, and chemical analyses including sensitivity and precision.
User’s Guide and Metadata for the PICES Nonindigenous Species Information System

EPA Science Inventory

The database, the "PICES Nonindigenous Species Information System", was constucted to synthesize the global distributions, environmental tolerances, and natural history attributes of the nonindigenous species in the North Pacific and Hawaii. The User's Guide provides th...
Geospatial resources for supporting data standards, guidance and best practice in health informatics

PubMed Central

2011-01-01

Background The 1980s marked the occasion when Geographical Information System (GIS) technology was broadly introduced into the geo-spatial community through the establishment of a strong GIS industry. This technology quickly disseminated across many countries, and has now become established as an important research, planning and commercial tool for a wider community that includes organisations in the public and private health sectors. The broad acceptance of GIS technology and the nature of its functionality have meant that numerous datasets have been created over the past three decades. Most of these datasets have been created independently, and without any structured documentation systems in place. However, search and retrieval systems can only work if there is a mechanism for datasets existence to be discovered and this is where proper metadata creation and management can greatly help. This situation must be addressed through support mechanisms such as Web-based portal technologies, metadata editor tools, automation, metadata standards and guidelines and collaborative efforts with relevant individuals and organisations. Engagement with data developers or administrators should also include a strategy of identifying the benefits associated with metadata creation and publication. Findings The establishment of numerous Spatial Data Infrastructures (SDIs), and other Internet resources, is a testament to the recognition of the importance of supporting good data management and sharing practices across the geographic information community. These resources extend to health informatics in support of research, public services and teaching and learning. This paper identifies many of these resources available to the UK academic health informatics community. It also reveals the reluctance of many spatial data creators across the wider UK academic community to use these resources to create and publish metadata, or deposit their data in repositories for sharing. The Go-Geo! service is introduced as an SDI developed to provide UK academia with the necessary resources to address the concerns surrounding metadata creation and data sharing. The Go-Geo! portal, Geodoc metadata editor tool, ShareGeo spatial data repository, and a range of other support resources, are described in detail. Conclusions This paper describes a variety of resources available for the health research and public health sector to use for managing and sharing their data. The Go-Geo! service is one resource which offers an SDI for the eclectic range of disciplines using GIS in UK academia, including health informatics. The benefits of data management and sharing are immense, and in these times of cost restraints, these resources can be seen as solutions to find cost savings which can be reinvested in more research. PMID:21269487
Workflows for ingest of research data into digital archives - tests with Archivematica

NASA Astrophysics Data System (ADS)

Kirchner, I.; Bertelmann, R.; Gebauer, P.; Hasler, T.; Hirt, M.; Klump, J. F.; Peters-Kotting, W.; Rusch, B.; Ulbricht, D.

2013-12-01

Publication of research data and future re-use of measured data require the long-term preservation of digital objects. The ISO OAIS reference model defines responsibilities for long-term preservation of digital objects and although there is software available to support preservation of digital data, there are still problems remaining to be solved. A key task in preservation is to make the datasets ready for ingest into the archive, which is called the creation of Submission Information Packages (SIPs) in the OAIS model. This includes the creation of appropriate preservation metadata. Scientists need to be trained to deal with different types of data and to heighten their awareness for quality metadata. Other problems arise during the assembly of SIPs and during ingest into the archive because file format validators may produce conflicting output for identical data files and these conflicts are difficult to resolve automatically. Also, validation and identification tools are notorious for their poor performance. In the project EWIG Zuse-Institute Berlin acts as an infrastructure facility, while the Institute for Meteorology at FU Berlin and the German research Centre for Geosciences GFZ act as two different data producers. The aim of the project is to develop workflows for the transfer of research data into digital archives and the future re-use of data from long-term archives with emphasis on data from the geosciences. The technical work is supplemented by interviews with data practitioners at several institutions to identify problems in digital preservation workflows and by the development of university teaching materials to train students in the curation of research data and metadata. The free and open-source software Archivematica [1] is used as digital preservation system. The creation and ingest of SIPs has to meet several archival standards and be compatible to the Metadata Encoding and Transmission Standard (METS). The two data producers use different software in their workflows to test the assembly of SIPs and ingest of SIPs into the archive. GFZ Potsdam uses a combination of eSciDoc [2], panMetaDocs [3], and bagit [4] to collect research data and assemble SIPs for ingest into Archivematica, while the Institute for Meteorology at FU Berlin evaluates a variety of software solutions to describe data and publications and to generate SIPs. [1] http://www.archivematica.org [2] http://www.escidoc.org [3] http://panmetadocs.sf.net [4] http://sourceforge.net/projects/loc-xferutils/
Partnerships To Mine Unexploited Sources of Metadata.

ERIC Educational Resources Information Center

Reynolds, Regina Romano

This paper discusses the metadata created for other purposes as a potential source of bibliographic data. The first section addresses collecting metadata by means of templates, including the Nordic Metadata Project's Dublin Core Metadata Template. The second section considers potential partnerships for re-purposing metadata for bibliographic use,…
Web mapping system for complex processing and visualization of environmental geospatial datasets

NASA Astrophysics Data System (ADS)

Titov, Alexander; Gordov, Evgeny; Okladnikov, Igor

2016-04-01

Environmental geospatial datasets (meteorological observations, modeling and reanalysis results, etc.) are used in numerous research applications. Due to a number of objective reasons such as inherent heterogeneity of environmental datasets, big dataset volume, complexity of data models used, syntactic and semantic differences that complicate creation and use of unified terminology, the development of environmental geodata access, processing and visualization services as well as client applications turns out to be quite a sophisticated task. According to general INSPIRE requirements to data visualization geoportal web applications have to provide such standard functionality as data overview, image navigation, scrolling, scaling and graphical overlay, displaying map legends and corresponding metadata information. It should be noted that modern web mapping systems as integrated geoportal applications are developed based on the SOA and might be considered as complexes of interconnected software tools for working with geospatial data. In the report a complex web mapping system including GIS web client and corresponding OGC services for working with geospatial (NetCDF, PostGIS) dataset archive is presented. There are three basic tiers of the GIS web client in it: 1. Tier of geospatial metadata retrieved from central MySQL repository and represented in JSON format 2. Tier of JavaScript objects implementing methods handling: --- NetCDF metadata --- Task XML object for configuring user calculations, input and output formats --- OGC WMS/WFS cartographical services 3. Graphical user interface (GUI) tier representing JavaScript objects realizing web application business logic Metadata tier consists of a number of JSON objects containing technical information describing geospatial datasets (such as spatio-temporal resolution, meteorological parameters, valid processing methods, etc). The middleware tier of JavaScript objects implementing methods for handling geospatial metadata, task XML object, and WMS/WFS cartographical services interconnects metadata and GUI tiers. The methods include such procedures as JSON metadata downloading and update, launching and tracking of the calculation task running on the remote servers as well as working with WMS/WFS cartographical services including: obtaining the list of available layers, visualizing layers on the map, exporting layers in graphical (PNG, JPG, GeoTIFF), vector (KML, GML, Shape) and digital (NetCDF) formats. Graphical user interface tier is based on the bundle of JavaScript libraries (OpenLayers, GeoExt and ExtJS) and represents a set of software components implementing web mapping application business logic (complex menus, toolbars, wizards, event handlers, etc.). GUI provides two basic capabilities for the end user: configuring the task XML object functionality and cartographical information visualizing. The web interface developed is similar to the interface of such popular desktop GIS applications, as uDIG, QuantumGIS etc. Web mapping system developed has shown its effectiveness in the process of solving real climate change research problems and disseminating investigation results in cartographical form. The work is supported by SB RAS Basic Program Projects VIII.80.2.1 and IV.38.1.7.
ESDORA: A Data Archive Infrastructure Using Digital Object Model and Open Source Frameworks

NASA Astrophysics Data System (ADS)

Shrestha, Biva; Pan, Jerry; Green, Jim; Palanisamy, Giriprakash; Wei, Yaxing; Lenhardt, W.; Cook, R. Bob; Wilson, B. E.; Leggott, M.

2011-12-01

There are an array of challenges associated with preserving, managing, and using contemporary scientific data. Large volume, multiple formats and data services, and the lack of a coherent mechanism for metadata/data management are some of the common issues across data centers. It is often difficult to preserve the data history and lineage information, along with other descriptive metadata, hindering the true science value for the archived data products. In this project, we use digital object abstraction architecture as the information/knowledge framework to address these challenges. We have used the following open-source frameworks: Fedora-Commons Repository, Drupal Content Management System, Islandora (Drupal Module) and Apache Solr Search Engine. The system is an active archive infrastructure for Earth Science data resources, which include ingestion, archiving, distribution, and discovery functionalities. We use an ingestion workflow to ingest the data and metadata, where many different aspects of data descriptions (including structured and non-structured metadata) are reviewed. The data and metadata are published after reviewing multiple times. They are staged during the reviewing phase. Each digital object is encoded in XML for long-term preservation of the content and relations among the digital items. The software architecture provides a flexible, modularized framework for adding pluggable user-oriented functionality. Solr is used to enable word search as well as faceted search. A home grown spatial search module is plugged in to allow user to make a spatial selection in a map view. A RDF semantic store within the Fedora-Commons Repository is used for storing information on data lineage, dissemination services, and text-based metadata. We use the semantic notion "isViewerFor" to register internally or externally referenced URLs, which are rendered within the same web browser when possible. With appropriate mapping of content into digital objects, many different data descriptions, including structured metadata, data history, auditing trails, are captured and coupled with the data content. The semantic store provides a foundation for possible further utilizations, including provide full-fledged Earth Science ontology for data interpretation or lineage tracking. Datasets from the NASA-sponsored Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC) as well as from the Synthesis Thematic Data Center (MAST-DC) are used in a testing deployment with the system. The testing deployment allows us to validate the features and values described here for the integrated system, which will be presented here. Overall, we believe that the integrated system is valid, reusable data archive software that provides digital stewardship for Earth Sciences data content, now and in the future. References: [1] Devarakonda, Ranjeet, and Harold Shanafield. "Drupal: Collaborative framework for science research." Collaboration Technologies and Systems (CTS), 2011 International Conference on. IEEE, 2011. [2] Devarakonda, Ranjeet, et al. "Semantic search integration to climate data." Collaboration Technologies and Systems (CTS), 2014 International Conference on. IEEE, 2014.
Computational approaches to define a human milk metaglycome

PubMed Central

Agravat, Sanjay B.; Song, Xuezheng; Rojsajjakul, Teerapat; Cummings, Richard D.; Smith, David F.

2016-01-01

Motivation: The goal of deciphering the human glycome has been hindered by the lack of high-throughput sequencing methods for glycans. Although mass spectrometry (MS) is a key technology in glycan sequencing, MS alone provides limited information about the identification of monosaccharide constituents, their anomericity and their linkages. These features of individual, purified glycans can be partly identified using well-defined glycan-binding proteins, such as lectins and antibodies that recognize specific determinants within glycan structures. Results: We present a novel computational approach to automate the sequencing of glycans using metadata-assisted glycan sequencing, which combines MS analyses with glycan structural information from glycan microarray technology. Success in this approach was aided by the generation of a ‘virtual glycome’ to represent all potential glycan structures that might exist within a metaglycomes based on a set of biosynthetic assumptions using known structural information. We exploited this approach to deduce the structures of soluble glycans within the human milk glycome by matching predicted structures based on experimental data against the virtual glycome. This represents the first meta-glycome to be defined using this method and we provide a publically available web-based application to aid in sequencing milk glycans. Availability and implementation: http://glycomeseq.emory.edu Contact: sagravat@bidmc.harvard.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26803164
Substance Identification Information from EPA's Substance Registry

EPA Pesticide Factsheets

The Substance Registry Services (SRS) is the authoritative resource for basic information about substances of interest to the U.S. EPA and its state and tribal partners. Substances, particularly chemicals, can have many valid synonyms. For example, toluene, methyl benzene, and phenyl methane, are commonly used names for the same chemical. EPA programs collect environmental data for this chemical using each of these names, plus others. This diversity leads to problems when a user is looking for programmatic data for toluene but is unaware that the data is stored under the synonym methyl benzene. For each substance, the SRS identifies the statutes, EPA programs, as well as organization external to EPA, that track or regulate that substance and the synonym used by that statute, EPA program or external organization. Besides standardized information for each chemical, such as the Chemical Abstracts Services name and the Chemical Abstracts Number and the EPA Registry Name (the EPA standard name), the SRS also includes additional information, such as molecular weight and molecular formula. Additionally, an SRS Internal Tracking Number uniquely identifies each substance, enabling cross-walking between synonyms. EPA is providing a large .ZIP file providing the SRS data in CSV format, and a separate small metadata file in XML containing the field names and definitions.
Data Model and Relational Database Design for Highway Runoff Water-Quality Metadata

USGS Publications Warehouse

Granato, Gregory E.; Tessler, Steven

2001-01-01

A National highway and urban runoff waterquality metadatabase was developed by the U.S. Geological Survey in cooperation with the Federal Highway Administration as part of the National Highway Runoff Water-Quality Data and Methodology Synthesis (NDAMS). The database was designed to catalog available literature and to document results of the synthesis in a format that would facilitate current and future research on highway and urban runoff. This report documents the design and implementation of the NDAMS relational database, which was designed to provide a catalog of available information and the results of an assessment of the available data. All the citations and the metadata collected during the review process are presented in a stratified metadatabase that contains citations for relevant publications, abstracts (or previa), and reportreview metadata for a sample of selected reports that document results of runoff quality investigations. The database is referred to as a metadatabase because it contains information about available data sets rather than a record of the original data. The database contains the metadata needed to evaluate and characterize how valid, current, complete, comparable, and technically defensible published and available information may be when evaluated for application to the different dataquality objectives as defined by decision makers. This database is a relational database, in that all information is ultimately linked to a given citation in the catalog of available reports. The main database file contains 86 tables consisting of 29 data tables, 11 association tables, and 46 domain tables. The data tables all link to a particular citation, and each data table is focused on one aspect of the information collected in the literature search and the evaluation of available information. This database is implemented in the Microsoft (MS) Access database software because it is widely used within and outside of government and is familiar to many existing and potential customers. The stratified metadatabase design for the NDAMS program is presented in the MS Access file DBDESIGN.mdb and documented with a data dictionary in the NDAMS_DD.mdb file recorded on the CD-ROM. The data dictionary file includes complete documentation of the table names, table descriptions, and information about each of the 419 fields in the database.
XML — an opportunity for data standards in the geosciences

NASA Astrophysics Data System (ADS)

Houlding, Simon W.

2001-08-01

Extensible markup language (XML) is a recently introduced meta-language standard on the Web. It provides the rules for development of metadata (markup) standards for information transfer in specific fields. XML allows development of markup languages that describe what information is rather than how it should be presented. This allows computer applications to process the information in intelligent ways. In contrast hypertext markup language (HTML), which fuelled the initial growth of the Web, is a metadata standard concerned exclusively with presentation of information. Besides its potential for revolutionizing Web activities, XML provides an opportunity for development of meaningful data standards in specific application fields. The rapid endorsement of XML by science, industry and e-commerce has already spawned new metadata standards in such fields as mathematics, chemistry, astronomy, multi-media and Web micro-payments. Development of XML-based data standards in the geosciences would significantly reduce the effort currently wasted on manipulating and reformatting data between different computer platforms and applications and would ensure compatibility with the new generation of Web browsers. This paper explores the evolution, benefits and status of XML and related standards in the more general context of Web activities and uses this as a platform for discussion of its potential for development of data standards in the geosciences. Some of the advantages of XML are illustrated by a simple, browser-compatible demonstration of XML functionality applied to a borehole log dataset. The XML dataset and the associated stylesheet and schema declarations are available for FTP download.
A Web-based open-source database for the distribution of hyperspectral signatures

NASA Astrophysics Data System (ADS)

Ferwerda, J. G.; Jones, S. D.; Du, Pei-Jun

2006-10-01

With the coming of age of field spectroscopy as a non-destructive means to collect information on the physiology of vegetation, there is a need for storage of signatures, and, more importantly, their metadata. Without the proper organisation of metadata, the signatures itself become limited. In order to facilitate re-distribution of data, a database for the storage & distribution of hyperspectral signatures and their metadata was designed. The database was built using open-source software, and can be used by the hyperspectral community to share their data. Data is uploaded through a simple web-based interface. The database recognizes major file-formats by ASD, GER and International Spectronics. The database source code is available for download through the hyperspectral.info web domain, and we happily invite suggestion for additions & modification for the database to be submitted through the online forums on the same website.
Content-aware network storage system supporting metadata retrieval

NASA Astrophysics Data System (ADS)

Liu, Ke; Qin, Leihua; Zhou, Jingli; Nie, Xuejun

2008-12-01

Nowadays, content-based network storage has become the hot research spot of academy and corporation[1]. In order to solve the problem of hit rate decline causing by migration and achieve the content-based query, we exploit a new content-aware storage system which supports metadata retrieval to improve the query performance. Firstly, we extend the SCSI command descriptor block to enable system understand those self-defined query requests. Secondly, the extracted metadata is encoded by extensible markup language to improve the universality. Thirdly, according to the demand of information lifecycle management (ILM), we store those data in different storage level and use corresponding query strategy to retrieval them. Fourthly, as the file content identifier plays an important role in locating data and calculating block correlation, we use it to fetch files and sort query results through friendly user interface. Finally, the experiments indicate that the retrieval strategy and sort algorithm have enhanced the retrieval efficiency and precision.
OntoFire: an ontology-based geo-portal for wildfires

NASA Astrophysics Data System (ADS)

Kalabokidis, K.; Athanasis, N.; Vaitis, M.

2011-12-01

With the proliferation of the geospatial technologies on the Internet, the role of geo-portals (i.e. gateways to Spatial Data Infrastructures) in the area of wildfires management emerges. However, keyword-based techniques often frustrate users when looking for data of interest in geo-portal environments, while little attention has been paid to shift from the conventional keyword-based to navigation-based mechanisms. The presented OntoFire system is an ontology-based geo-portal about wildfires. Through the proposed navigation mechanisms, the relationships between the data can be discovered, which would otherwise not be possible when using conventional querying techniques alone. End users can use the browsing interface to find resources of interest by using the navigation mechanisms provided. Data providers can use the publishing interface to submit new metadata, modify metadata or removing metadata in/from the catalogue. The proposed approach can improve the discovery of valuable information that is necessary to set priorities for disaster mitigation and prevention strategies. OntoFire aspires to be a focal point of integration and management of a very large amount of information, contributing in this way to the dissemination of knowledge and to the preparedness of the operational stakeholders.
Exploring Cultural Heritage Resources in a 3d Collaborative Environment

NASA Astrophysics Data System (ADS)

Respaldiza, A.; Wachowicz, M.; Vázquez Hoehne, A.

2012-06-01

Cultural heritage is a complex and diverse concept, which brings together a wide domain of information. Resources linked to a cultural heritage site may consist of physical artefacts, books, works of art, pictures, historical maps, aerial photographs, archaeological surveys and 3D models. Moreover, all these resources are listed and described by a set of a variety of metadata specifications that allow their online search and consultation on the most basic characteristics of them. Some examples include Norma ISO 19115, Dublin Core, AAT, CDWA, CCO, DACS, MARC, MoReq, MODS, MuseumDat, TGN, SPECTRUM, VRA Core and Z39.50. Gateways are in place to fit in these metadata standards into those used in a SDI (ISO 19115 or INSPIRE), but substantial work still remains to be done for the complete incorporation of cultural heritage information. Therefore, the aim of this paper is to demonstrate how the complexity of cultural heritage resources can be dealt with by a visual exploration of their metadata within a 3D collaborative environment. The 3D collaborative environments are promising tools that represent the new frontier of our capacity of learning, understanding, communicating and transmitting culture.
Sharing our data—An overview of current (2016) USGS policies and practices for publishing data on ScienceBase and an example interactive mapping application

USGS Publications Warehouse

Chase, Katherine J.; Bock, Andrew R.; Sando, Roy

2017-01-05

This report provides an overview of current (2016) U.S. Geological Survey policies and practices related to publishing data on ScienceBase, and an example interactive mapping application to display those data. ScienceBase is an integrated data sharing platform managed by the U.S. Geological Survey. This report describes resources that U.S. Geological Survey Scientists can use for writing data management plans, formatting data, and creating metadata, as well as for data and metadata review, uploading data and metadata to ScienceBase, and sharing metadata through the U.S. Geological Survey Science Data Catalog. Because data publishing policies and practices are evolving, scientists should consult the resources cited in this paper for definitive policy information.An example is provided where, using the content of a published ScienceBase data release that is associated with an interpretive product, a simple user interface is constructed to demonstrate how the open source capabilities of the R programming language and environment can interact with the properties and objects of the ScienceBase item and be used to generate interactive maps.
Digital Curation of Earth Science Samples Starts in the Field

NASA Astrophysics Data System (ADS)

Lehnert, K. A.; Hsu, L.; Song, L.; Carter, M. R.

2014-12-01

Collection of physical samples in the field is an essential part of research in the Earth Sciences. Samples provide a basis for progress across many disciplines, from the study of global climate change now and over the Earth's history, to present and past biogeochemical cycles, to magmatic processes and mantle dynamics. The types of samples, methods of collection, and scope and scale of sampling campaigns are highly diverse, ranging from large-scale programs to drill rock and sediment cores on land, in lakes, and in the ocean, to environmental observation networks with continuous sampling, to single investigator or small team expeditions to remote areas around the globe or trips to local outcrops. Cyberinfrastructure for sample-related fieldwork needs to cater to the different needs of these diverse sampling activities, aligning with specific workflows, regional constraints such as connectivity or climate, and processing of samples. In general, digital tools should assist with capture and management of metadata about the sampling process (location, time, method) and the sample itself (type, dimension, context, images, etc.), management of the physical objects (e.g., sample labels with QR codes), and the seamless transfer of sample metadata to data systems and software relevant to the post-sampling data acquisition, data processing, and sample curation. In order to optimize CI capabilities for samples, tools and workflows need to adopt community-based standards and best practices for sample metadata, classification, identification and registration. This presentation will provide an overview and updates of several ongoing efforts that are relevant to the development of standards for digital sample management: the ODM2 project that has generated an information model for spatially-discrete, feature-based earth observations resulting from in-situ sensors and environmental samples, aligned with OGC's Observation & Measurements model (Horsburgh et al, AGU FM 2014); implementation of the IGSN (International Geo Sample Number) as a globally unique sample identifier via a distributed system of allocating agents and a central registry; and the EarthCube Research Coordination Network iSamplES (Internet of Samples in the Earth Sciences) that aims to improve sharing and curation of samples through the use of CI.
Mining metadata from unidentified ITS sequences in GenBank: A case study in Inocybe (Basidiomycota)

PubMed Central

2008-01-01

Background The lack of reference sequences from well-identified mycorrhizal fungi often poses a challenge to the inference of taxonomic affiliation of sequences from environmental samples, and many environmental sequences are thus left unidentified. Such unidentified sequences belonging to the widely distributed ectomycorrhizal fungal genus Inocybe (Basidiomycota) were retrieved from GenBank and divided into species that were identified in a phylogenetic context using a reference dataset from an ongoing study of the genus. The sequence metadata of the unidentified Inocybe sequences stored in GenBank, as well as data from the corresponding original papers, were compiled and used to explore the ecology and distribution of the genus. In addition, the relative occurrence of Inocybe was contrasted to that of other mycorrhizal genera. Results Most species of Inocybe were found to have less than 3% intraspecific variability in the ITS2 region of the nuclear ribosomal DNA. This cut-off value was used jointly with phylogenetic analysis to delimit and identify unidentified Inocybe sequences to species level. A total of 177 unidentified Inocybe ITS sequences corresponding to 98 species were recovered, 32% of which were successfully identified to species level in this study. These sequences account for an unexpectedly large proportion of the publicly available unidentified fungal ITS sequences when compared with other mycorrhizal genera. Eight Inocybe species were reported from multiple hosts and some even from hosts forming arbutoid or orchid mycorrhizae. Furthermore, Inocybe sequences have been reported from four continents and in climate zones ranging from cold temperate to equatorial climate. Out of the 19 species found in more than one study, six were found in both Europe and North America and one was found in both Europe and Japan, indicating that at least many north temperate species have a wide distribution. Conclusion Although DNA-based species identification and circumscription are associated with practical and conceptual difficulties, they also offer new possibilities and avenues for research. Metadata assembly holds great potential to synthesize valuable information from community studies for use in a species and taxonomy-oriented framework. PMID:18282272

panMetaDocs and DataSync - providing a convenient way to share and publish research data

NASA Astrophysics Data System (ADS)

Ulbricht, D.; Klump, J. F.

2013-12-01

In recent years research institutions, geological surveys and funding organizations started to build infrastructures to facilitate the re-use of research data from previous work. At present, several intermeshed activities are coordinated to make data systems of the earth sciences interoperable and recorded data discoverable. Driven by governmental authorities, ISO19115/19139 emerged as metadata standards for discovery of data and services. Established metadata transport protocols like OAI-PMH and OGC-CSW are used to disseminate metadata to data portals. With the persistent identifiers like DOI and IGSN research data and corresponding physical samples can be given unambiguous names and thus become citable. In summary, these activities focus primarily on 'ready to give away'-data, already stored in an institutional repository and described with appropriate metadata. Many datasets are not 'born' in this state but are produced in small and federated research projects. To make access and reuse of these 'small data' easier, these data should be centrally stored and version controlled from the very beginning of activities. We developed DataSync [1] as supplemental application to the panMetaDocs [2] data exchange platform as a data management tool for small science projects. DataSync is a JAVA-application that runs on a local computer and synchronizes directory trees into an eSciDoc-repository [3] by creating eSciDoc-objects via eSciDocs' REST API. DataSync can be installed on multiple computers and is in this way able to synchronize files of a research team over the internet. XML Metadata can be added as separate files that are managed together with data files as versioned eSciDoc-objects. A project-customized instance of panMetaDocs is provided to show a web-based overview of the previously uploaded file collection and to allow further annotation with metadata inside the eSciDoc-repository. PanMetaDocs is a PHP based web application to assist the creation of metadata in any XML-based metadata schema. To reduce manual entries of metadata to a minimum and make use of contextual information in a project setting, metadata fields can be populated with static or dynamic content. Access rights can be defined to control visibility and access to stored objects. Notifications about recently updated datasets are available by RSS and e-mail and the entire inventory can be harvested via OAI-PMH. panMetaDocs is optimized to be harvested by panFMP [4]. panMetaDocs is able to mint dataset DOIs though DataCite and uses eSciDocs' REST API to transfer eSciDoc-objects from a non-public 'pending'-status to the published status 'released', which makes data and metadata of the published object available worldwide through the internet. The application scenario presented here shows the adoption of open source applications to data sharing and publication of data. An eSciDoc-repository is used as storage for data and metadata. DataSync serves as a file ingester and distributor, whereas panMetaDocs' main function is to annotate the dataset files with metadata to make them ready for publication and sharing with your own team, or with the scientific community.
Using phrases and document metadata to improve topic modeling of clinical reports.

PubMed

Speier, William; Ong, Michael K; Arnold, Corey W

2016-06-01

Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.
Mercury: Reusable software application for Metadata Management, Data Discovery and Access

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.

2009-12-01

Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury is itself a reusable toolset for metadata, with current use in 12 different projects. Mercury also supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects To balance these common and project-specific needs, Mercury’s architecture includes three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of configuration files. The harvested files are then passed to the Indexing system, where each of the fields in these structured metadata records are indexed properly, so that the query engine can perform simple, keyword, spatial and temporal searches across these metadata sources. The search user interface software has two API categories; a common core API which is used by all the Mercury user interfaces for querying the index and a customized API for project specific user interfaces. For our work in producing a reusable, portable, robust, feature-rich application, Mercury received a 2008 NASA Earth Science Data Systems Software Reuse Working Group Peer-Recognition Software Reuse Award. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Identifying and acting on inappropriate metadata: a critique of the Grattan Institute Report on questionable care in Australian hospitals.

PubMed

Cooper, P David; Smart, David R

2017-03-01

In an era of ever-increasing medical costs, the identification and prohibition of ineffective medical therapies is of considerable economic interest to healthcare funding bodies. Likewise, the avoidance of interventions with an unduly elevated clinical risk/benefit ratio would be similarly advantageous for patients. Regrettably, the identification of such therapies has proven problematic. A recent paper from the Grattan Institute in Australia (identifying five hospital procedures as having the potential for disinvestment on these grounds) serves as a timely illustration of the difficulties inherent in non-clinicians attempting to accurately recognize such interventions using non-clinical, indirect or poorly validated datasets. To evaluate the Grattan Institute report and associated publications, and determine the validity of their assertions regarding hyperbaric oxygen treatment (HBOT) utilisation in Australia. Critical analysis of the HBOT metadata included in the Grattan Institute study was undertaken and compared against other publicly available Australian Government and independent data sources. The consistency, accuracy and reproducibility of data definitions and terminology across the various publications were appraised and the authors' methodology was reviewed. Reference sources were examined for relevance and temporal eligibility. Review of the Grattan publications demonstrated multiple problems, including (but not limited to): confusing patient-treatments with total patient numbers; incorrect identification of 'appropriate' vs. 'inappropriate' indications for HBOT; reliance upon a compromised primary dataset; lack of appropriate clinical input, muddled methodology and use of inapplicable references. These errors resulted in a more than seventy-fold over-estimation of the number of patients potentially treated inappropriately with HBOT in Australia that year. Numerous methodological flaws and factual errors have been identified in this Grattan Institute study. Its conclusions are not valid and a formal retraction is required.
Managing Heterogeneous Information Systems through Discovery and Retrieval of Generic Concepts.

ERIC Educational Resources Information Center

Srinivasan, Uma; Ngu, Anne H. H.; Gedeon, Tom

2000-01-01

Introduces a conceptual integration approach to heterogeneous databases or information systems that exploits the similarity in metalevel information and performs metadata mining on database objects to discover a set of concepts that serve as a domain abstraction and provide a conceptual layer above existing legacy systems. Presents results of…
The role of metadata in managing large environmental science datasets. Proceedings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melton, R.B.; DeVaney, D.M.; French, J. C.

1995-06-01

The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.
MetaSRA: normalized human sample-specific metadata for the Sequence Read Archive.

PubMed

Bernstein, Matthew N; Doan, AnHai; Dewey, Colin N

2017-09-15

The NCBI's Sequence Read Archive (SRA) promises great biological insight if one could analyze the data in the aggregate; however, the data remain largely underutilized, in part, due to the poor structure of the metadata associated with each sample. The rules governing submissions to the SRA do not dictate a standardized set of terms that should be used to describe the biological samples from which the sequencing data are derived. As a result, the metadata include many synonyms, spelling variants and references to outside sources of information. Furthermore, manual annotation of the data remains intractable due to the large number of samples in the archive. For these reasons, it has been difficult to perform large-scale analyses that study the relationships between biomolecular processes and phenotype across diverse diseases, tissues and cell types present in the SRA. We present MetaSRA, a database of normalized SRA human sample-specific metadata following a schema inspired by the metadata organization of the ENCODE project. This schema involves mapping samples to terms in biomedical ontologies, labeling each sample with a sample-type category, and extracting real-valued properties. We automated these tasks via a novel computational pipeline. The MetaSRA is available at metasra.biostat.wisc.edu via both a searchable web interface and bulk downloads. Software implementing our computational pipeline is available at http://github.com/deweylab/metasra-pipeline. cdewey@biostat.wisc.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data.

PubMed

Larralde, Martin; Lawson, Thomas N; Weber, Ralf J M; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R; Steinbeck, Christoph; Salek, Reza M

2017-08-15

Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. reza.salek@ebi.ac.uk or isatools@googlegroups.com. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
Developing a Metadata Infrastructure to facilitate data driven science gateway and to provide Inspire/GEMINI compliance for CLIPC

NASA Astrophysics Data System (ADS)

Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian

2016-04-01

The CLIPC project is developing a portal to provide a single point of access for scientific information on climate change. This is made possible through the Copernicus Earth Observation Programme for Europe, which will deliver a new generation of environmental measurements of climate quality. The data about the physical environment which is used to inform climate change policy and adaptation measures comes from several categories: satellite measurements, terrestrial observing systems, model projections and simulations and from re-analyses (syntheses of all available observations constrained with numerical weather prediction systems). These data categories are managed by different communities: CLIPC will provide a single point of access for the whole range of data. The CLIPC portal will provide a number of indicators showing impacts on specific sectors which have been generated using a range of factors selected through structured expert consultation. It will also, as part of the transformation services, allow users to explore the consequences of using different combinations of driving factors which they consider to be of particular relevance to their work or life. The portal will provide information on the scientific quality and pitfalls of such transformations to prevent misleading usage of the results. The CLIPC project will develop an end to end processing chain (indicator tool kit), from comprehensive information on the climate state through to highly aggregated decision relevant products. Indicators of climate change and climate change impact will be provided, and a tool kit to update and post process the collection of indicators will be integrated into the portal. The CLIPC portal has a distributed architecture, making use of OGC services provided by e.g., climate4impact.eu and CEDA. CLIPC has two themes: 1. Harmonized access to climate datasets derived from models, observations and re-analyses 2. A climate impact tool kit to evaluate, rank and aggregate indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.
Speech Recognition for A Digital Video Library.

ERIC Educational Resources Information Center

Witbrock, Michael J.; Hauptmann, Alexander G.

1998-01-01

Production of the meta-data supporting the Informedia Digital Video Library interface is automated using techniques derived from artificial intelligence research. Speech recognition and natural-language processing, information retrieval, and image analysis are applied to produce an interface that helps users locate information and navigate more…
Analyzing existing conventional soil information sources to be incorporated in thematic Spatial Data Infrastructures

NASA Astrophysics Data System (ADS)

Pascual-Aguilar, J. A.; Rubio, J. L.; Domínguez, J.; Andreu, V.

2012-04-01

New information technologies give the possibility of widespread dissemination of spatial information to different geographical scales from continental to local by means of Spatial Data Infrastructures. Also administrative awareness on the need for open access information services has allowed the citizens access to this spatial information through development of legal documents, such as the INSPIRE Directive of the European Union, adapted by national laws as in the case of Spain. The translation of the general criteria of generic Spatial Data Infrastructures (SDI) to thematic ones is a crucial point for the progress of these instruments as large tool for the dissemination of information. In such case, it must be added to the intrinsic criteria of digital information, such as the harmonization information and the disclosure of metadata, the own environmental information characteristics and the techniques employed in obtaining it. In the case of inventories and mapping of soils, existing information obtained by traditional means, prior to the digital technologies, is considered to be a source of valid information, as well as unique, for the development of thematic SDI. In this work, an evaluation of existing and accessible information that constitutes the basis for building a thematic SDI of soils in Spain is undertaken. This information framework has common features to other European Union states. From a set of more than 1,500 publications corresponding to the national territory of Spain, the study was carried out in those documents (94) found for five autonomous regions of northern Iberian Peninsula (Asturias, Cantabria, Basque Country, Navarra and La Rioja). The analysis was performed taking into account the criteria of soil mapping and inventories. The results obtained show a wide variation in almost all the criteria: geographic representation (projections, scales) and geo-referencing the location of the profiles, map location of profiles integrated with edaphic units, description and taxonomic classification systems of soils (FAO, Soil taxonomy, etc.), amount and type of soil analysis parameters and dates of the inventories. In conclusion, the construction of thematic SDI on soil should take into account, prior to the integration of all maps and inventories, a series of processes of harmonization that allows spatial continuity between existing information and also temporal identification of the inventories and maps. This should require the development of at least two types of integration tools: (1) enabling spatial continuity without contradictions between maps made at different times and with different criteria and (2) the development of information systems data (metadata) to highlight the characteristics of information and connection possibilities with other sources that comprise the Spatial Data Infrastructure. Acknowledgements This research has financed by the European Union within the framework of the GS Soil project (eContentplus Programme ECP-2008-GEO-318004).
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.

2008-12-01

Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.
Mercury: An Example of Effective Software Reuse for Metadata Management, Data Discovery and Access

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet

2008-01-01

Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. Though originally developed for NASA, the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfacesmore » then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the 12 projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects. To balance these common and project-specific needs, Mercury's architecture has three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of project specific configuration files. The harvested files are structured metadata records that are indexed against the search library API consistently, so that it can render various search capabilities such as simple, fielded, spatial and temporal. This backend component is supported by a very flexible, easy to use Graphical User Interface which is driven by cascading style sheets, which make it even simpler for reusable design implementation. The new Mercury system is based on a Service Oriented Architecture and effectively reuses components for various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. The software also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets, integrated shopping cart to order datasets from various data centers (ORNL DAAC, NSIDC) and integrated visualization tools. Other features include: Filtering and dynamic sorting of search results, book- markable search results, save, retrieve, and modify search criteria.« less
Building Format-Agnostic Metadata Repositories

NASA Astrophysics Data System (ADS)

Cechini, M.; Pilone, D.

2010-12-01

This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on-demand translation of ‘core’ metadata to any supported result format. With this identified approach, the Earth Scientist is provided with a consistent data representation as they interact with a variety of datasets that utilize multiple metadata formats. They are then able to focus their efforts on the more critical research activities which they are undertaking.
More comprehensive forensic genetic marker analyses for accurate human remains identification using massively parallel DNA sequencing.

PubMed

Ambers, Angie D; Churchill, Jennifer D; King, Jonathan L; Stoljarova, Monika; Gill-King, Harrell; Assidi, Mourad; Abu-Elmagd, Muhammad; Buhmeida, Abdelbaset; Al-Qahtani, Mohammed; Budowle, Bruce

2016-10-17

Although the primary objective of forensic DNA analyses of unidentified human remains is positive identification, cases involving historical or archaeological skeletal remains often lack reference samples for comparison. Massively parallel sequencing (MPS) offers an opportunity to provide biometric data in such cases, and these cases provide valuable data on the feasibility of applying MPS for characterization of modern forensic casework samples. In this study, MPS was used to characterize 140-year-old human skeletal remains discovered at a historical site in Deadwood, South Dakota, United States. The remains were in an unmarked grave and there were no records or other metadata available regarding the identity of the individual. Due to the high throughput of MPS, a variety of biometric markers could be typed using a single sample. Using MPS and suitable forensic genetic markers, more relevant information could be obtained from a limited quantity and quality sample. Results were obtained for 25/26 Y-STRs, 34/34 Y SNPs, 166/166 ancestry-informative SNPs, 24/24 phenotype-informative SNPs, 102/102 human identity SNPs, 27/29 autosomal STRs (plus amelogenin), and 4/8 X-STRs (as well as ten regions of mtDNA). The Y-chromosome (Y-STR, Y-SNP) and mtDNA profiles of the unidentified skeletal remains are consistent with the R1b and H1 haplogroups, respectively. Both of these haplogroups are the most common haplogroups in Western Europe. Ancestry-informative SNP analysis also supported European ancestry. The genetic results are consistent with anthropological findings that the remains belong to a male of European ancestry (Caucasian). Phenotype-informative SNP data provided strong support that the individual had light red hair and brown eyes. This study is among the first to genetically characterize historical human remains with forensic genetic marker kits specifically designed for MPS. The outcome demonstrates that substantially more genetic information can be obtained from the same initial quantities of DNA as that of current CE-based analyses.
Metadata - National Hospital Ambulatory Medical Care Survey (NHAMCS)

EPA Pesticide Factsheets

The National Hospital Ambulatory Medical Care Survey (NHAMCS) is designed to collect information on the services provided in hospital emergency and outpatient departments and in ambulatory surgery centers.
Metadata - National Hospital Discharge Survey (NHDS)

EPA Pesticide Factsheets

The National Hospital Discharge Survey (NHDS) is an annual probability survey that collects information on the characteristics of inpatients discharged from non-federal short-stay hospitals in the United States.
Visualizing and Validating Metadata Traceability within the CDISC Standards.

PubMed

Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

2017-01-01

The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information.
Design and implementation of a health data interoperability mediator.

PubMed

Kuo, Mu-Hsing; Kushniruk, Andre William; Borycki, Elizabeth Marie

2010-01-01

The objective of this study is to design and implement a common-gateway oriented mediator to solve the health data interoperability problems that exist among heterogeneous health information systems. The proposed mediator has three main components: (1) a Synonym Dictionary (SD) that stores a set of global metadata and terminologies to serve as the mapping intermediary, (2) a Semantic Mapping Engine (SME) that can be used to map metadata and instance semantics, and (3) a DB-to-XML module that translates source health data stored in a database into XML format and back. A routine admission notification data exchange scenario is used to test the efficiency and feasibility of the proposed mediator. The study results show that the proposed mediator can make health information exchange more efficient.
Visualizing and Validating Metadata Traceability within the CDISC Standards

PubMed Central

Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

2017-01-01

The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information. PMID:28815125

Making Metadata Better with CMR and MMT

NASA Technical Reports Server (NTRS)

Gilman, Jason Arthur; Shum, Dana

2016-01-01

Ensuring complete, consistent and high quality metadata is a challenge for metadata providers and curators. The CMR and MMT systems provide providers and curators options to build in metadata quality from the start and also assess and improve the quality of already existing metadata.
Reflecting on the challenges of building a rich interconnected metadata database to describe the experiments of phase six of the coupled climate model intercomparison project (CMIP6) for the Earth System Documentation Project (ES-DOC) and anticipating the opportunities that tooling and services based on rich metadata can provide.

NASA Astrophysics Data System (ADS)

Pascoe, C. L.

2017-12-01

The Coupled Model Intercomparison Project (CMIP) has coordinated climate model experiments involving multiple international modelling teams since 1995. This has led to a better understanding of past, present, and future climate. The 2017 sixth phase of the CMIP process (CMIP6) consists of a suite of common experiments, and 21 separate CMIP-Endorsed Model Intercomparison Projects (MIPs) making a total of 244 separate experiments. Precise descriptions of the suite of CMIP6 experiments have been captured in a Common Information Model (CIM) database by the Earth System Documentation Project (ES-DOC). The database contains descriptions of forcings, model configuration requirements, ensemble information and citation links, as well as text descriptions and information about the rationale for each experiment. The database was built from statements about the experiments found in the academic literature, the MIP submissions to the World Climate Research Programme (WCRP), WCRP summary tables and correspondence with the principle investigators for each MIP. The database was collated using spreadsheets which are archived in the ES-DOC Github repository and then rendered on the ES-DOC website. A diagramatic view of the workflow of building the database of experiment metadata for CMIP6 is shown in the attached figure.The CIM provides the formalism to collect detailed information from diverse sources in a standard way across all the CMIP6 MIPs. The ES-DOC documentation acts as a unified reference for CMIP6 information to be used both by data producers and consumers. This is especially important given the federated nature of the CMIP6 project. Because the CIM allows forcing constraints and other experiment attributes to be referred to by more than one experiment, we can streamline the process of collecting information from modelling groups about how they set up their models for each experiment. End users of the climate model archive will be able to ask questions enabled by the interconnectedness of the metadata such as "Which MIPs make use of experiment A?" and "Which experiments use forcing constraint B?".
Repository Profiles for Atmospheric and Climate Sciences: Capabilities and Trends in Data Services

NASA Astrophysics Data System (ADS)

Hou, C. Y.; Thompson, C. A.; Palmer, C. L.

2014-12-01

As digital research data proliferate and expectations for open access escalate, the landscape of data repositories is becoming more complex. For example, DataBib currently identifies 980 data repositories across the disciplines, with 117 categorized under Geosciences. In atmospheric and climate sciences, there are great expectations for the integration and reuse of data for advancing science. To realize this potential, resources are needed that explicate the range of repository options available for locating and depositing open data, their conditions of access and use, and the services and tools they provide. This study profiled 38 open digital repositories in the atmospheric and climate sciences, analyzing each on 55 criteria through content analysis of their websites. The results provide a systematic way to assess and compare capabilities, services, and institutional characteristics and identify trends across repositories. Selected results from the more detailed outcomes to be presented: Most repositories offer guidance on data format(s) for submission and dissemination. 42% offer authorization-free access. More than half use some type of data identification system such as DOIs. Nearly half offer some data processing, with a similar number providing software or tools. 78.9% request that users cite or acknowledge datasets used and the data center. Only 21.1% recommend specific metadata standards, such as ISO 19115 or Dublin Core, with more than half utilizing a customized metadata scheme. Information was rarely provided on repository certification and accreditation and uneven for transfer of rights and data security. Few provided policy information on preservation, migration, reappraisal, disposal, or long-term sustainability. As repository use increases, it will be important for institutions to make their procedures and policies explicit, to build trust with user communities and improve efficiencies in data sharing. Resources such as repository profiles will be essential for scientists to weigh options and understand trends in data services across the evolving network of repositories.
A Flexible Online Metadata Editing and Management System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aguilar, Raul; Pan, Jerry Yun; Gries, Corinna

2010-01-01

A metadata editing and management system is being developed employing state of the art XML technologies. A modular and distributed design was chosen for scalability, flexibility, options for customizations, and the possibility to add more functionality at a later stage. The system consists of a desktop design tool or schema walker used to generate code for the actual online editor, a native XML database, and an online user access management application. The design tool is a Java Swing application that reads an XML schema, provides the designer with options to combine input fields into online forms and give the fieldsmore » user friendly tags. Based on design decisions, the tool generates code for the online metadata editor. The code generated is an implementation of the XForms standard using the Orbeon Framework. The design tool fulfills two requirements: First, data entry forms based on one schema may be customized at design time and second data entry applications may be generated for any valid XML schema without relying on custom information in the schema. However, the customized information generated at design time is saved in a configuration file which may be re-used and changed again in the design tool. Future developments will add functionality to the design tool to integrate help text, tool tips, project specific keyword lists, and thesaurus services. Additional styling of the finished editor is accomplished via cascading style sheets which may be further customized and different look-and-feels may be accumulated through the community process. The customized editor produces XML files in compliance with the original schema, however, data from the current page is saved into a native XML database whenever the user moves to the next screen or pushes the save button independently of validity. Currently the system uses the open source XML database eXist for storage and management, which comes with third party online and desktop management tools. However, access to metadata files in the application introduced here is managed in a custom online module, using a MySQL backend accessed by a simple Java Server Faces front end. A flexible system with three grouping options, organization, group and single editing access is provided. Three levels were chosen to distribute administrative responsibilities and handle the common situation of an information manager entering the bulk of the metadata but leave specifics to the actual data provider.« less
Using the Proteomics Identifications Database (PRIDE).

PubMed

Martens, Lennart; Jones, Phil; Côté, Richard

2008-03-01

The Proteomics Identifications Database (PRIDE) is a public data repository designed to store, disseminate, and analyze mass spectrometry based proteomics datasets. The PRIDE database can accommodate any level of detailed metadata about the submitted results, which can be queried, explored, viewed, or downloaded via the PRIDE Web interface. The PRIDE database also provides a simple, yet powerful, access control mechanism that fully supports confidential peer-reviewing of data related to a manuscript, ensuring that these results remain invisible to the general public while allowing referees and journal editors anonymized access to the data. This unit describes in detail the functionality that PRIDE provides with regards to searching, viewing, and comparing the available data, as well as different options for submitting data to PRIDE.
The National Map seamless digital elevation model specifications

USGS Publications Warehouse

Archuleta, Christy-Ann M.; Constance, Eric W.; Arundel, Samantha T.; Lowe, Amanda J.; Mantey, Kimberly S.; Phillips, Lori A.

2017-08-02

This specification documents the requirements and standards used to produce the seamless elevation layers for The National Map of the United States. Seamless elevation data are available for the conterminous United States, Hawaii, Alaska, and the U.S. territories, in three different resolutions—1/3-arc-second, 1-arc-second, and 2-arc-second. These specifications include requirements and standards information about source data requirements, spatial reference system, distribution tiling schemes, horizontal resolution, vertical accuracy, digital elevation model surface treatment, georeferencing, data source and tile dates, distribution and supporting file formats, void areas, metadata, spatial metadata, and quality assurance and control.
eScience for molecular-scale simulations and the eMinerals project.

PubMed

Salje, E K H; Artacho, E; Austen, K F; Bruin, R P; Calleja, M; Chappell, H F; Chiang, G-T; Dove, M T; Frame, I; Goodwin, A L; Kleese van Dam, K; Marmier, A; Parker, S C; Pruneda, J M; Todorov, I T; Trachenko, K; Tyer, R P; Walker, A M; White, T O H

2009-03-13

We review the work carried out within the eMinerals project to develop eScience solutions that facilitate a new generation of molecular-scale simulation work. Technological developments include integration of compute and data systems, developing of collaborative frameworks and new researcher-friendly tools for grid job submission, XML data representation, information delivery, metadata harvesting and metadata management. A number of diverse science applications will illustrate how these tools are being used for large parameter-sweep studies, an emerging type of study for which the integration of computing, data and collaboration is essential.
Microsoft Repository Version 2 and the Open Information Model.

ERIC Educational Resources Information Center

Bernstein, Philip A.; Bergstraesser, Thomas; Carlson, Jason; Pal, Shankar; Sanders, Paul; Shutt, David

1999-01-01

Describes the programming interface and implementation of the repository engine and the Open Information Model for Microsoft Repository, an object-oriented meta-data management facility that ships in Microsoft Visual Studio and Microsoft SQL Server. Discusses Microsoft's component object model, object manipulation, queries, and information…
Academic Research Library as Broker in Addressing Interoperability Challenges for the Geosciences

NASA Astrophysics Data System (ADS)

Smith, P., II

2015-12-01

Data capture is an important process in the research lifecycle. Complete descriptive and representative information of the data or database is necessary during data collection whether in the field or in the research lab. The National Science Foundation's (NSF) Public Access Plan (2015) mandates the need for federally funded projects to make their research data more openly available. Developing, implementing, and integrating metadata workflows into to the research process of the data lifecycle facilitates improved data access while also addressing interoperability challenges for the geosciences such as data description and representation. Lack of metadata or data curation can contribute to (1) semantic, (2) ontology, and (3) data integration issues within and across disciplinary domains and projects. Some researchers of EarthCube funded projects have identified these issues as gaps. These gaps can contribute to interoperability data access, discovery, and integration issues between domain-specific and general data repositories. Academic Research Libraries have expertise in providing long-term discovery and access through the use of metadata standards and provision of access to research data, datasets, and publications via institutional repositories. Metadata crosswalks, open archival information systems (OAIS), trusted-repositories, data seal of approval, persistent URL, linking data, objects, resources, and publications in institutional repositories and digital content management systems are common components in the library discipline. These components contribute to a library perspective on data access and discovery that can benefit the geosciences. The USGS Community for Data Integration (CDI) has developed the Science Support Framework (SSF) for data management and integration within its community of practice for contribution to improved understanding of the Earth's physical and biological systems. The USGS CDI SSF can be used as a reference model to map to EarthCube Funded projects with academic research libraries facilitating the data and information assets components of the USGS CDI SSF via institutional repositories and/or digital content management. This session will explore the USGS CDI SSF for cross-discipline collaboration considerations from a library perspective.
A New Look at Data Usage by Using Metadata Attributes as Indicators of Data Quality

NASA Astrophysics Data System (ADS)

Won, Y. I.; Wanchoo, L.; Behnke, J.

2016-12-01

NASA's Earth Observing System Data and Information System (EOSDIS) stores and distributes data from EOS satellites, as well as ancillary, airborne, in-situ, and socio-economic data. Twelve EOSDIS data centers support different scientific disciplines by providing products and services tailored to specific science communities. Although discipline oriented, these data centers provide common data management functions of ingest, archive and distribution, as well as documentation of their data and services on their web-sites. The Earth Science Data and Information System (ESDIS) Project collects these metrics from the EOSDIS data centers on a daily basis through a tool called the ESDIS Metrics System (EMS). These metrics are used in this study. The implementation of the Earthdata Login - formerly known as the User Registration System (URS) - across the various NASA data centers provides the EMS additional information about users obtaining data products from EOSDIS data centers. These additional user attributes collected by the Earthdata login, such as the user's primary area of study can augment the understanding of data usage, which in turn can help the EOSDIS program better understand the users' needs. This study will review the key metrics (users, distributed volume, and files) in multiple ways to gain an understanding of the significance of the metadata. Characterizing the usability of data by key metadata elements such as discipline and study area, will assist in understanding how the users have evolved over time. The data usage pattern based on version numbers may also provide some insight into the level of data quality. In addition, the data metrics by various services such as the Open-source Project for a Network Data Access Protocol (OPeNDAP), Web Map Service (WMS), Web Coverage Service (WCS), and subsets, will address how these services have extended the usage of data. Over-all, this study will present the usage of data and metadata by metrics analyses and will assist data centers in better supporting the needs of the users.
GraphMeta: Managing HPC Rich Metadata in Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Chen, Yong; Carns, Philip

High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
US Geoscience Information Network, Web Services for Geoscience Information Discovery and Access

NASA Astrophysics Data System (ADS)

Richard, S.; Allison, L.; Clark, R.; Coleman, C.; Chen, G.

2012-04-01

The US Geoscience information network has developed metadata profiles for interoperable catalog services based on ISO19139 and the OGC CSW 2.0.2. Currently data services are being deployed for the US Dept. of Energy-funded National Geothermal Data System. These services utilize OGC Web Map Services, Web Feature Services, and THREDDS-served NetCDF for gridded datasets. Services and underlying datasets (along with a wide variety of other information and non information resources are registered in the catalog system. Metadata for registration is produced by various workflows, including harvest from OGC capabilities documents, Drupal-based web applications, transformation from tabular compilations. Catalog search is implemented using the ESRI Geoportal open-source server. We are pursuing various client applications to demonstrated discovery and utilization of the data services. Currently operational applications allow catalog search and data acquisition from map services in an ESRI ArcMap extension, a catalog browse and search application built on openlayers and Django. We are developing use cases and requirements for other applications to utilize geothermal data services for resource exploration and evaluation.
Using Metadata To Improve Organization and Information Retrieval on the WWW.

ERIC Educational Resources Information Center

Doan, Bich-Lien; Beigbeder, Michel; Girardot, Jean-Jacques; Jaillon, Philippe

The growing volume of heterogeneous and distributed information on the World Wide Web has made it increasingly difficult for existing tools to retrieve relevant information. To improve the performance of these tools, this paper suggests how to handle two aspects of the problem. The first aspect concerns a better representation and description of…
Sorting Out the Web: Approaches to Subject Access. Contemporary Studies in Information Management, Policies, and Services.

ERIC Educational Resources Information Center

Schwartz, Candy

This book examines what has been done in providing subject access to networked resources. The first chapter provides a historical overview of information services, developments in information technology, end users, and the Internet, as well as a discussion of the library response to these developments. The second chapter discusses metadata,…
User needs analysis and usability assessment of DataMed - a biomedical data discovery index.

PubMed

Dixit, Ram; Rogith, Deevakar; Narayana, Vidya; Salimi, Mandana; Gururaj, Anupama; Ohno-Machado, Lucila; Xu, Hua; Johnson, Todd R

2017-11-30

To present user needs and usability evaluations of DataMed, a Data Discovery Index (DDI) that allows searching for biomedical data from multiple sources. We conducted 2 phases of user studies. Phase 1 was a user needs analysis conducted before the development of DataMed, consisting of interviews with researchers. Phase 2 involved iterative usability evaluations of DataMed prototypes. We analyzed data qualitatively to document researchers' information and user interface needs. Biomedical researchers' information needs in data discovery are complex, multidimensional, and shaped by their context, domain knowledge, and technical experience. User needs analyses validate the need for a DDI, while usability evaluations of DataMed show that even though aggregating metadata into a common search engine and applying traditional information retrieval tools are promising first steps, there remain challenges for DataMed due to incomplete metadata and the complexity of data discovery. Biomedical data poses distinct problems for search when compared to websites or publications. Making data available is not enough to facilitate biomedical data discovery: new retrieval techniques and user interfaces are necessary for dataset exploration. Consistent, complete, and high-quality metadata are vital to enable this process. While available data and researchers' information needs are complex and heterogeneous, a successful DDI must meet those needs and fit into the processes of biomedical researchers. Research directions include formalizing researchers' information needs, standardizing overviews of data to facilitate relevance judgments, implementing user interfaces for concept-based searching, and developing evaluation methods for open-ended discovery systems such as DDIs. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Application description and policy model in collaborative environment for sharing of information on epidemiological and clinical research data sets.

PubMed

de Carvalho, Elias César Araujo; Batilana, Adelia Portero; Simkins, Julie; Martins, Henrique; Shah, Jatin; Rajgor, Dimple; Shah, Anand; Rockart, Scott; Pietrobon, Ricardo

2010-02-19

Sharing of epidemiological and clinical data sets among researchers is poor at best, in detriment of science and community at large. The purpose of this paper is therefore to (1) describe a novel Web application designed to share information on study data sets focusing on epidemiological clinical research in a collaborative environment and (2) create a policy model placing this collaborative environment into the current scientific social context. The Database of Databases application was developed based on feedback from epidemiologists and clinical researchers requiring a Web-based platform that would allow for sharing of information about epidemiological and clinical study data sets in a collaborative environment. This platform should ensure that researchers can modify the information. A Model-based predictions of number of publications and funding resulting from combinations of different policy implementation strategies (for metadata and data sharing) were generated using System Dynamics modeling. The application allows researchers to easily upload information about clinical study data sets, which is searchable and modifiable by other users in a wiki environment. All modifications are filtered by the database principal investigator in order to maintain quality control. The application has been extensively tested and currently contains 130 clinical study data sets from the United States, Australia, China and Singapore. Model results indicated that any policy implementation would be better than the current strategy, that metadata sharing is better than data-sharing, and that combined policies achieve the best results in terms of publications. Based on our empirical observations and resulting model, the social network environment surrounding the application can assist epidemiologists and clinical researchers contribute and search for metadata in a collaborative environment, thus potentially facilitating collaboration efforts among research communities distributed around the globe.
Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

PubMed Central

Alexopoulou, Dimitra; Andreopoulos, Bill; Dietze, Heiko; Doms, Andreas; Gandon, Fabien; Hakenberg, Jörg; Khelif, Khaled; Schroeder, Michael; Wächter, Thomas

2009-01-01

Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently modelled ontology. Overall, the results show that well structured ontologies can play a very important role to improve disambiguation. Availability The three benchmark datasets created for the purpose of disambiguation are available in Additional file 1. PMID:19159460
Web Based Data Access to the World Data Center for Climate

NASA Astrophysics Data System (ADS)

Toussaint, F.; Lautenschlager, M.

2006-12-01

The World Data Center for Climate (WDC-Climate, www.wdc-climate.de) is hosted by the Model &Data Group (M&D) of the Max Planck Institute for Meteorology. The M&D department is financed by the German government and uses the computers and mass storage facilities of the German Climate Computing Centre (Deutsches Klimarechenzentrum, DKRZ). The WDC-Climate provides web access to 200 Terabytes of climate data; the total mass storage archive contains nearly 4 Petabytes. Although the majority of the datasets concern model output data, some satellite and observational data are accessible as well. The underlying relational database is distributed on five servers. The CERA relational data model is used to integrate catalogue data and mass data. The flexibility of the model allows to store and access very different types of data and metadata. The CERA metadata catalogue provides easy access to the content of the CERA database as well as to other data in the web. Visit ceramodel.wdc-climate.de for additional information on the CERA data model. The majority of the users access data via the CERA metadata catalogue, which is open without registration. However, prior to retrieving data user are required to check in and apply for a userid and password. The CERA metadata catalogue is servlet based. So it is accessible worldwide through any web browser at cera.wdc-climate.de. In addition to data and metadata access by the web catalogue, WDC-Climate offers a number of other forms of web based data access. All metadata are available via http request as xml files in various metadata formats (ISO, DC, etc., see wini.wdc-climate.de) which allows for easy data interchange with other catalogues. Model data can be retrieved in GRIB, ASCII, NetCDF, and binary (IEEE) format. WDC-Climate serves as data centre for various projects. Since xml files are accessible by http, the integration of data into applications of different projects is very easy. Projects supported by WDC-Climate are e.g. CEOP, IPCC, and CARIBIC. A script tool for data download (jblob) is offered on the web page, to make retrieval of huge data quantities more comfortable.
DialysisNet: Application for Integrating and Management Data Sources of Hemodialysis Information by Continuity of Care Record.

PubMed

Ku, Ho Suk; Kim, Sungho; Kim, HyeHyeon; Chung, Hee-Joon; Park, Yu Rang; Kim, Ju Han

2014-04-01

Health Avatar Beans was for the management of chronic kidney disease and end-stage renal disease (ESRD). This article is about the DialysisNet system in Health Avatar Beans for the seamless management of ESRD based on the personal health record. For hemodialysis data modeling, we identified common data elements for hemodialysis information (CDEHI). We used ASTM continuity of care record (CCR) and ISO/IEC 11179 for the compliance method with a standard model for the CDEHI. According to the contents of the ASTM CCR, we mapped the CDHEI to the contents and created the metadata from that. It was transformed and parsed into the database and verified according to the ASTM CCR/XML schema definition (XSD). DialysisNet was created as an iPad application. The contents of the CDEHI were categorized for effective management. For the evaluation of information transfer, we used CarePlatform, which was developed for data access. The metadata of CDEHI in DialysisNet was exchanged by the CarePlatform with semantic interoperability. The CDEHI was separated into a content list for individual patient data, a contents list for hemodialysis center data, consultation and transfer form, and clinical decision support data. After matching to the CCR, the CDEHI was transformed to metadata, and it was transformed to XML and proven according to the ASTM CCR/XSD. DialysisNet has specific consideration of visualization, graphics, images, statistics, and database. We created the DialysisNet application, which can integrate and manage data sources for hemodialysis information based on CCR standards.
Regulations in the field of Geo-Information

NASA Astrophysics Data System (ADS)

Felus, Y.; Keinan, E.; Regev, R.

2013-10-01

The geomatics profession has gone through a major revolution during the last two decades with the emergence of advanced GNSS, GIS and Remote Sensing technologies. These technologies have changed the core principles and working procedures of geomatics professionals. For this reason, surveying and mapping regulations, standards and specifications should be updated to reflect these changes. In Israel, the "Survey Regulations" is the principal document that regulates the professional activities in four key areas geodetic control, mapping, cadastre and Georaphic information systems. Licensed Surveyors and mapping professionals in Israel are required to work according to those regulations. This year a new set of regulations have been published and include a few major amendments as follows: In the Geodesy chapter, horizontal control is officially based on the Israeli network of Continuously Operating GNSS Reference Stations (CORS). The regulations were phrased in a manner that will allow minor datum changes to the CORS stations due to Earth Crustal Movements. Moreover, the regulations permit the use of GNSS for low accuracy height measurements. In the Cadastre chapter, the most critical change is the move to Coordinate Based Cadastre (CBC). Each parcel corner point is ranked according to its quality (accuracy and clarity of definition). The highest ranking for a parcel corner is 1. A point with a rank of 1 is defined by its coordinates alone. Any other contradicting evidence is inferior to the coordinates values. Cadastral Information is stored and managed via the National Cadastral Databases. In the Mapping and GIS chapter; the traditional paper maps (ranked by scale) are replaced by digital maps or spatial databases. These spatial databases are ranked by their quality level. Quality level is determined (similar to the ISO19157 Standard) by logical consistency, completeness, positional accuracy, attribute accuracy, temporal accuracy and usability. Metadata is another critical component of any spatial database. Every component in a map should have a metadata identification, even if the map was compiled from multiple resources. The regulations permit the use of advanced sensors and mapping techniques including LIDAR and digita l cameras that have been certified and meet the defined criteria. The article reviews these new regulations and the decision that led to them.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.