replicated metadata catalogue: Topics by Science.gov

Sample records for replicated metadata catalogue

The RBV metadata catalog

NASA Astrophysics Data System (ADS)

Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

2015-04-01

RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.
The eGenVar data management system—cataloguing and sharing sensitive data and metadata for the life sciences

PubMed Central

Razick, Sabry; Močnik, Rok; Thomas, Laurent F.; Ryeng, Einar; Drabløs, Finn; Sætrom, Pål

2014-01-01

Systematic data management and controlled data sharing aim at increasing reproducibility, reducing redundancy in work, and providing a way to efficiently locate complementing or contradicting information. One method of achieving this is collecting data in a central repository or in a location that is part of a federated system and providing interfaces to the data. However, certain data, such as data from biobanks or clinical studies, may, for legal and privacy reasons, often not be stored in public repositories. Instead, we describe a metadata cataloguing system and a software suite for reporting the presence of data from the life sciences domain. The system stores three types of metadata: file information, file provenance and data lineage, and content descriptions. Our software suite includes both graphical and command line interfaces that allow users to report and tag files with these different metadata types. Importantly, the files remain in their original locations with their existing access-control mechanisms in place, while our system provides descriptions of their contents and relationships. Our system and software suite thereby provide a common framework for cataloguing and sharing both public and private data. Database URL: http://bigr.medisin.ntnu.no/data/eGenVar/ PMID:24682735
Digital Initiatives and Metadata Use in Thailand

ERIC Educational Resources Information Center

SuKantarat, Wichada

2008-01-01

Purpose: This paper aims to provide information about various digital initiatives in libraries in Thailand and especially use of Dublin Core metadata in cataloguing digitized objects in academic and government digital databases. Design/methodology/approach: The author began researching metadata use in Thailand in 2003 and 2004 while on sabbatical…
NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat

NASA Astrophysics Data System (ADS)

Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley

2017-04-01

NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse
Grid computing enhances standards-compatible geospatial catalogue service

NASA Astrophysics Data System (ADS)

Chen, Aijun; Di, Liping; Bai, Yuqi; Wei, Yaxing; Liu, Yang

2010-04-01

A catalogue service facilitates sharing, discovery, retrieval, management of, and access to large volumes of distributed geospatial resources, for example data, services, applications, and their replicas on the Internet. Grid computing provides an infrastructure for effective use of computing, storage, and other resources available online. The Open Geospatial Consortium has proposed a catalogue service specification and a series of profiles for promoting the interoperability of geospatial resources. By referring to the profile of the catalogue service for Web, an innovative information model of a catalogue service is proposed to offer Grid-enabled registry, management, retrieval of and access to geospatial resources and their replicas. This information model extends the e-business registry information model by adopting several geospatial data and service metadata standards—the International Organization for Standardization (ISO)'s 19115/19119 standards and the US Federal Geographic Data Committee (FGDC) and US National Aeronautics and Space Administration (NASA) metadata standards for describing and indexing geospatial resources. In order to select the optimal geospatial resources and their replicas managed by the Grid, the Grid data management service and information service from the Globus Toolkits are closely integrated with the extended catalogue information model. Based on this new model, a catalogue service is implemented first as a Web service. Then, the catalogue service is further developed as a Grid service conforming to Grid service specifications. The catalogue service can be deployed in both the Web and Grid environments and accessed by standard Web services or authorized Grid services, respectively. The catalogue service has been implemented at the George Mason University/Center for Spatial Information Science and Systems (GMU/CSISS), managing more than 17 TB of geospatial data and geospatial Grid services. This service makes it easy to share and
Grid Enabled Geospatial Catalogue Web Service

NASA Technical Reports Server (NTRS)

Chen, Ai-Jun; Di, Li-Ping; Wei, Ya-Xing; Liu, Yang; Bui, Yu-Qi; Hu, Chau-Min; Mehrotra, Piyush

2004-01-01

Geospatial Catalogue Web Service is a vital service for sharing and interoperating volumes of distributed heterogeneous geospatial resources, such as data, services, applications, and their replicas over the web. Based on the Grid technology and the Open Geospatial Consortium (0GC) s Catalogue Service - Web Information Model, this paper proposes a new information model for Geospatial Catalogue Web Service, named as GCWS which can securely provides Grid-based publishing, managing and querying geospatial data and services, and the transparent access to the replica data and related services under the Grid environment. This information model integrates the information model of the Grid Replica Location Service (RLS)/Monitoring & Discovery Service (MDS) with the information model of OGC Catalogue Service (CSW), and refers to the geospatial data metadata standards from IS0 19115, FGDC and NASA EOS Core System and service metadata standards from IS0 191 19 to extend itself for expressing geospatial resources. Using GCWS, any valid geospatial user, who belongs to an authorized Virtual Organization (VO), can securely publish and manage geospatial resources, especially query on-demand data in the virtual community and get back it through the data-related services which provide functions such as subsetting, reformatting, reprojection etc. This work facilitates the geospatial resources sharing and interoperating under the Grid environment, and implements geospatial resources Grid enabled and Grid technologies geospatial enabled. It 2!so makes researcher to focus on science, 2nd not cn issues with computing ability, data locztic, processir,g and management. GCWS also is a key component for workflow-based virtual geospatial data producing.
Document Classification in Support of Automated Metadata Extraction Form Heterogeneous Collections

ERIC Educational Resources Information Center

Flynn, Paul K.

2014-01-01

A number of federal agencies, universities, laboratories, and companies are placing their documents online and making them searchable via metadata fields such as author, title, and publishing organization. To enable this, every document in the collection must be catalogued using the metadata fields. Though time consuming, the task of identifying…
Interpreting the ASTM 'content standard for digital geospatial metadata'

USGS Publications Warehouse

Nebert, Douglas D.

1996-01-01

ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.
NCI's national environmental research data collection: metadata management built on standards and preparing for the semantic web

NASA Astrophysics Data System (ADS)

Wang, Jingbo; Bastrakova, Irina; Evans, Ben; Gohar, Kashif; Santana, Fabiana; Wyborn, Lesley

2015-04-01

National Computational Infrastructure (NCI) manages national environmental research data collections (10+ PB) as part of its specialized high performance data node of the Research Data Storage Infrastructure (RDSI) program. We manage 40+ data collections using NCI's Data Management Plan (DMP), which is compatible with the ISO 19100 metadata standards. We utilize ISO standards to make sure our metadata is transferable and interoperable for sharing and harvesting. The DMP is used along with metadata from the data itself, to create a hierarchy of data collection, dataset and time series catalogues that is then exposed through GeoNetwork for standard discoverability. This hierarchy catalogues are linked using a parent-child relationship. The hierarchical infrastructure of our GeoNetwork catalogues system aims to address both discoverability and in-house administrative use-cases. At NCI, we are currently improving the metadata interoperability in our catalogue by linking with standardized community vocabulary services. These emerging vocabulary services are being established to help harmonise data from different national and international scientific communities. One such vocabulary service is currently being established by the Australian National Data Services (ANDS). Data citation is another important aspect of the NCI data infrastructure, which allows tracking of data usage and infrastructure investment, encourage data sharing, and increasing trust in research that is reliant on these data collections. We incorporate the standard vocabularies into the data citation metadata so that the data citation become machine readable and semantically friendly for web-search purpose as well. By standardizing our metadata structure across our entire data corpus, we are laying the foundation to enable the application of appropriate semantic mechanisms to enhance discovery and analysis of NCI's national environmental research data information. We expect that this will further
EUDAT B2FIND : A Cross-Discipline Metadata Service and Discovery Portal

NASA Astrophysics Data System (ADS)

Widmann, Heinrich; Thiemann, Hannes

2016-04-01

The European Data Infrastructure (EUDAT) project aims at a pan-European environment that supports a variety of multiple research communities and individuals to manage the rising tide of scientific data by advanced data management technologies. This led to the establishment of the community-driven Collaborative Data Infrastructure that implements common data services and storage resources to tackle the basic requirements and the specific challenges of international and interdisciplinary research data management. The metadata service B2FIND plays a central role in this context by providing a simple and user-friendly discovery portal to find research data collections stored in EUDAT data centers or in other repositories. For this we store the diverse metadata collected from heterogeneous sources in a comprehensive joint metadata catalogue and make them searchable in an open data portal. The implemented metadata ingestion workflow consists of three steps. First the metadata records - provided either by various research communities or via other EUDAT services - are harvested. Afterwards the raw metadata records are converted and mapped to unified key-value dictionaries as specified by the B2FIND schema. The semantic mapping of the non-uniform, community specific metadata to homogenous structured datasets is hereby the most subtle and challenging task. To assure and improve the quality of the metadata this mapping process is accompanied by • iterative and intense exchange with the community representatives, • usage of controlled vocabularies and community specific ontologies and • formal and semantic validation. Finally the mapped and checked records are uploaded as datasets to the catalogue, which is based on the open source data portal software CKAN. CKAN provides a rich RESTful JSON API and uses SOLR for dataset indexing that enables users to query and search in the catalogue. The homogenization of the community specific data models and vocabularies enables not
Academic Libraries and the Semantic Web: What the Future May Hold for Research-Supporting Library Catalogues

ERIC Educational Resources Information Center

Campbell, D. Grant; Fast, Karl V.

2004-01-01

This paper examines how future metadata capabilities could enable academic libraries to exploit information on the emerging Semantic Web in their library catalogues. Whereas current metadata architectures treat the Web as a simple means of interchanging bibliographic data that have been created by libraries, this paper suggests that academic…
Discovering Physical Samples Through Identifiers, Metadata, and Brokering

NASA Astrophysics Data System (ADS)

Arctur, D. K.; Hills, D. J.; Jenkyns, R.

2015-12-01

Physical samples, particularly in the geosciences, are key to understanding the Earth system, its history, and its evolution. Our record of the Earth as captured by physical samples is difficult to explain and mine for understanding, due to incomplete, disconnected, and evolving metadata content. This is further complicated by differing ways of classifying, cataloguing, publishing, and searching the metadata, especially when specimens do not fit neatly into a single domain—for example, fossils cross disciplinary boundaries (mineral and biological). Sometimes even the fundamental classification systems evolve, such as the geological time scale, triggering daunting processes to update existing specimen databases. Increasingly, we need to consider ways of leveraging permanent, unique identifiers, as well as advancements in metadata publishing that link digital records with physical samples in a robust, adaptive way. An NSF EarthCube Research Coordination Network (RCN) called the Internet of Samples (iSamples) is now working to bridge the metadata schemas for biological and geological domains. We are leveraging the International Geo Sample Number (IGSN) that provides a versatile system of registering physical samples, and working to harmonize this with the DataCite schema for Digital Object Identifiers (DOI). A brokering approach for linking disparate catalogues and classification systems could help scale discovery and access to the many large collections now being managed (sometimes millions of specimens per collection). This presentation is about our community building efforts, research directions, and insights to date.
The Global Streamflow Indices and Metadata Archive (GSIM) - Part 1: The production of a daily streamflow archive and metadata

NASA Astrophysics Data System (ADS)

Do, Hong Xuan; Gudmundsson, Lukas; Leonard, Michael; Westra, Seth

2018-04-01

This is the first part of a two-paper series presenting the Global Streamflow Indices and Metadata archive (GSIM), a worldwide collection of metadata and indices derived from more than 35 000 daily streamflow time series. This paper focuses on the compilation of the daily streamflow time series based on 12 free-to-access streamflow databases (seven national databases and five international collections). It also describes the development of three metadata products (freely available at https://doi.pangaea.de/10.1594/PANGAEA.887477): (1) a GSIM catalogue collating basic metadata associated with each time series, (2) catchment boundaries for the contributing area of each gauge, and (3) catchment metadata extracted from 12 gridded global data products representing essential properties such as land cover type, soil type, and climate and topographic characteristics. The quality of the delineated catchment boundary is also made available and should be consulted in GSIM application. The second paper in the series then explores production and analysis of streamflow indices. Having collated an unprecedented number of stations and associated metadata, GSIM can be used to advance large-scale hydrological research and improve understanding of the global water cycle.
Distributed Multi-interface Catalogue for Geospatial Data

NASA Astrophysics Data System (ADS)

Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.

2007-12-01

Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and
Deploying the ATLAS Metadata Interface (AMI) on the cloud with Jenkins

NASA Astrophysics Data System (ADS)

Lambert, F.; Odier, J.; Fulachier, J.; ATLAS Collaboration

2017-10-01

The ATLAS Metadata Interface (AMI) is a mature application of more than 15 years of existence. Mainly used by the ATLAS experiment at CERN, it consists of a very generic tool ecosystem for metadata aggregation and cataloguing. AMI is used by the ATLAS production system, therefore the service must guarantee a high level of availability. We describe our monitoring and administration systems, and the Jenkins-based strategy used to dynamically test and deploy cloud OpenStack nodes on demand.
Map Metadata: Essential Elements for Search and Storage

ERIC Educational Resources Information Center

Beamer, Ashley

2009-01-01

Purpose: The purpose of this paper is to develop an understanding of the issues surrounding the cataloguing of maps in archives and libraries. An investigation into appropriate metadata formats, such as MARC21, EAD and Dublin Core with RDF, shows how particular map data can be stored. Mathematical map elements, specifically co-ordinates, are…
The ATLAS Eventlndex: data flow and inclusion of other metadata

NASA Astrophysics Data System (ADS)

Barberis, D.; Cárdenas Zárate, S. E.; Favareto, A.; Fernandez Casani, A.; Gallas, E. J.; Garcia Montoro, C.; Gonzalez de la Hoz, S.; Hrivnac, J.; Malon, D.; Prokoshin, F.; Salt, J.; Sanchez, J.; Toebbicke, R.; Yuan, R.; ATLAS Collaboration

2016-10-01

The ATLAS EventIndex is the catalogue of the event-related metadata for the information collected from the ATLAS detector. The basic unit of this information is the event record, containing the event identification parameters, pointers to the files containing this event as well as trigger decision information. The main use case for the EventIndex is event picking, as well as data consistency checks for large production campaigns. The EventIndex employs the Hadoop platform for data storage and handling, as well as a messaging system for the collection of information. The information for the EventIndex is collected both at Tier-0, when the data are first produced, and from the Grid, when various types of derived data are produced. The EventIndex uses various types of auxiliary information from other ATLAS sources for data collection and processing: trigger tables from the condition metadata database (COMA), dataset information from the data catalogue AMI and the Rucio data management system and information on production jobs from the ATLAS production system. The ATLAS production system is also used for the collection of event information from the Grid jobs. EventIndex developments started in 2012 and in the middle of 2015 the system was commissioned and started collecting event metadata, as a part of ATLAS Distributed Computing operations.
Revealing Library Collections: NSLA Re-Imagining Libraries Project 8--Description and Cataloguing

ERIC Educational Resources Information Center

Gatenby, Pam

2010-01-01

One of the most exciting developments to occur within the library profession in recent years has been the re-evaluation of library's role in resource discovery in the web environment. This has involved recognition of short-comings of online library catalogues, a re-affirmation of the importance of metadata and a new focus on the wealth of…
Scalable global grid catalogue for Run3 and beyond

NASA Astrophysics Data System (ADS)

Martinez Pedreira, M.; Grigoras, C.; ALICE Collaboration

2017-10-01

The AliEn (ALICE Environment) file catalogue is a global unique namespace providing mapping between a UNIX-like logical name structure and the corresponding physical files distributed over 80 storage elements worldwide. Powerful search tools and hierarchical metadata information are integral parts of the system and are used by the Grid jobs as well as local users to store and access all files on the Grid storage elements. The catalogue has been in production since 2005 and over the past 11 years has grown to more than 2 billion logical file names. The backend is a set of distributed relational databases, ensuring smooth growth and fast access. Due to the anticipated fast future growth, we are looking for ways to enhance the performance and scalability by simplifying the catalogue schema while keeping the functionality intact. We investigated different backend solutions, such as distributed key value stores, as replacement for the relational database. This contribution covers the architectural changes in the system, together with the technology evaluation, benchmark results and conclusions.
A standard for measuring metadata quality in spectral libraries

NASA Astrophysics Data System (ADS)

Rasaiah, B.; Jones, S. D.; Bellman, C.

2013-12-01

A standard for measuring metadata quality in spectral libraries Barbara Rasaiah, Simon Jones, Chris Bellman RMIT University Melbourne, Australia barbara.rasaiah@rmit.edu.au, simon.jones@rmit.edu.au, chris.bellman@rmit.edu.au ABSTRACT There is an urgent need within the international remote sensing community to establish a metadata standard for field spectroscopy that ensures high quality, interoperable metadata sets that can be archived and shared efficiently within Earth observation data sharing systems. Metadata are an important component in the cataloguing and analysis of in situ spectroscopy datasets because of their central role in identifying and quantifying the quality and reliability of spectral data and the products derived from them. This paper presents approaches to measuring metadata completeness and quality in spectral libraries to determine reliability, interoperability, and re-useability of a dataset. Explored are quality parameters that meet the unique requirements of in situ spectroscopy datasets, across many campaigns. Examined are the challenges presented by ensuring that data creators, owners, and data users ensure a high level of data integrity throughout the lifecycle of a dataset. Issues such as field measurement methods, instrument calibration, and data representativeness are investigated. The proposed metadata standard incorporates expert recommendations that include metadata protocols critical to all campaigns, and those that are restricted to campaigns for specific target measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. Approaches towards an operational and logistically viable implementation of a quality standard are discussed. This paper also proposes a way forward for adapting and enhancing current geospatial metadata standards to the unique requirements of field spectroscopy metadata quality. [0430] BIOGEOSCIENCES / Computational methods and data processing [0480

Multi-facetted Metadata - Describing datasets with different metadata schemas at the same time

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Klump, Jens; Bertelmann, Roland

2013-04-01

Inspired by the wish to re-use research data a lot of work is done to bring data systems of the earth sciences together. Discovery metadata is disseminated to data portals to allow building of customized indexes of catalogued dataset items. Data that were once acquired in the context of a scientific project are open for reappraisal and can now be used by scientists that were not part of the original research team. To make data re-use easier, measurement methods and measurement parameters must be documented in an application metadata schema and described in a written publication. Linking datasets to publications - as DataCite [1] does - requires again a specific metadata schema and every new use context of the measured data may require yet another metadata schema sharing only a subset of information with the meta information already present. To cope with the problem of metadata schema diversity in our common data repository at GFZ Potsdam we established a solution to store file-based research data and describe these with an arbitrary number of metadata schemas. Core component of the data repository is an eSciDoc infrastructure that provides versioned container objects, called eSciDoc [2] "items". The eSciDoc content model allows assigning files to "items" and adding any number of metadata records to these "items". The eSciDoc items can be submitted, revised, and finally published, which makes the data and metadata available through the internet worldwide. GFZ Potsdam uses eSciDoc to support its scientific publishing workflow, including mechanisms for data review in peer review processes by providing temporary web links for external reviewers that do not have credentials to access the data. Based on the eSciDoc API, panMetaDocs [3] provides a web portal for data management in research projects. PanMetaDocs, which is based on panMetaWorks [4], is a PHP based web application that allows to describe data with any XML-based schema. It uses the eSciDoc infrastructures
TOPCAT -- Tool for OPerations on Catalogues And Tables

NASA Astrophysics Data System (ADS)

Taylor, Mark

TOPCAT is an interactive graphical viewer and editor for tabular data. It has been designed for use with astronomical tables such as object catalogues, but is not restricted to astronomical applications. It understands a number of different astronomically important formats, and more formats can be added. It is designed to cope well with large tables; a million rows by a hundred columns should not present a problem even with modest memory and CPU resources. It offers a variety of ways to view and analyse the data, including a browser for the cell data themselves, viewers for information about table and column metadata, tools for joining tables using flexible matching algorithms, and visualisation facilities including histograms, 2- and 3-dimensional scatter plots, and density maps. Using a powerful and extensible Java-based expression language new columns can be defined and row subsets selected for separate analysis. Selecting a row can be configured to trigger an action, for instance displaying an image of the catalogue object in an external viewer. Table data and metadata can be edited and the resulting modified table can be written out in a wide range of output formats. A number of options are provided for loading data from external sources, including Virtual Observatory (VO) services, thus providing a gateway to many remote archives of astronomical data. It can also interoperate with other desktop tools using the SAMP protocol. TOPCAT is written in pure Java and is available under the GNU General Public Licence. Its underlying table processing facilities are provided by STIL, the Starlink Tables Infrastructure Library.
Transformation of HDF-EOS metadata from the ECS model to ISO 19115-based XML

NASA Astrophysics Data System (ADS)

Wei, Yaxing; Di, Liping; Zhao, Baohua; Liao, Guangxuan; Chen, Aijun

2007-02-01

Nowadays, geographic data, such as NASA's Earth Observation System (EOS) data, are playing an increasing role in many areas, including academic research, government decisions and even in people's every lives. As the quantity of geographic data becomes increasingly large, a major problem is how to fully make use of such data in a distributed, heterogeneous network environment. In order for a user to effectively discover and retrieve the specific information that is useful, the geographic metadata should be described and managed properly. Fortunately, the emergence of XML and Web Services technologies greatly promotes information distribution across the Internet. The research effort discussed in this paper presents a method and its implementation for transforming Hierarchical Data Format (HDF)-EOS metadata from the NASA ECS model to ISO 19115-based XML, which will be managed by the Open Geospatial Consortium (OGC) Catalogue Services—Web Profile (CSW). Using XML and international standards rather than domain-specific models to describe the metadata of those HDF-EOS data, and further using CSW to manage the metadata, can allow metadata information to be searched and interchanged more widely and easily, thus promoting the sharing of HDF-EOS data.
Large-Scale Data Collection Metadata Management at the National Computation Infrastructure

NASA Astrophysics Data System (ADS)

Wang, J.; Evans, B. J. K.; Bastrakova, I.; Ryder, G.; Martin, J.; Duursma, D.; Gohar, K.; Mackey, T.; Paget, M.; Siddeswara, G.

2014-12-01

Data Collection management has become an essential activity at the National Computation Infrastructure (NCI) in Australia. NCI's partners (CSIRO, Bureau of Meteorology, Australian National University, and Geoscience Australia), supported by the Australian Government and Research Data Storage Infrastructure (RDSI), have established a national data resource that is co-located with high-performance computing. This paper addresses the metadata management of these data assets over their lifetime. NCI manages 36 data collections (10+ PB) categorised as earth system sciences, climate and weather model data assets and products, earth and marine observations and products, geosciences, terrestrial ecosystem, water management and hydrology, astronomy, social science and biosciences. The data is largely sourced from NCI partners, the custodians of many of the national scientific records, and major research community organisations. The data is made available in a HPC and data-intensive environment - a ~56000 core supercomputer, virtual labs on a 3000 core cloud system, and data services. By assembling these large national assets, new opportunities have arisen to harmonise the data collections, making a powerful cross-disciplinary resource.To support the overall management, a Data Management Plan (DMP) has been developed to record the workflows, procedures, the key contacts and responsibilities. The DMP has fields that can be exported to the ISO19115 schema and to the collection level catalogue of GeoNetwork. The subset or file level metadata catalogues are linked with the collection level through parent-child relationship definition using UUID. A number of tools have been developed that support interactive metadata management, bulk loading of data, and support for computational workflows or data pipelines. NCI creates persistent identifiers for each of the assets. The data collection is tracked over its lifetime, and the recognition of the data providers, data owners, data
Simplified Metadata Curation via the Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Pilone, D.

2015-12-01

The Metadata Management Tool (MMT) is the newest capability developed as part of NASA Earth Observing System Data and Information System's (EOSDIS) efforts to simplify metadata creation and improve metadata quality. The MMT was developed via an agile methodology, taking into account inputs from GCMD's science coordinators and other end-users. In its initial release, the MMT uses the Unified Metadata Model for Collections (UMM-C) to allow metadata providers to easily create and update collection records in the ISO-19115 format. Through a simplified UI experience, metadata curators can create and edit collections without full knowledge of the NASA Best Practices implementation of ISO-19115 format, while still generating compliant metadata. More experienced users are also able to access raw metadata to build more complex records as needed. In future releases, the MMT will build upon recent work done in the community to assess metadata quality and compliance with a variety of standards through application of metadata rubrics. The tool will provide users with clear guidance as to how to easily change their metadata in order to improve their quality and compliance. Through these features, the MMT allows data providers to create and maintain compliant and high quality metadata in a short amount of time.
Predicting structured metadata from unstructured metadata.

PubMed

Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2016-01-01

Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. © The Author(s) 2016. Published by Oxford University Press.
Predicting structured metadata from unstructured metadata

PubMed Central

Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2016-01-01

Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/ PMID:28637268
In-field Access to Geoscientific Metadata through GPS-enabled Mobile Phones

NASA Astrophysics Data System (ADS)

Hobona, Gobe; Jackson, Mike; Jordan, Colm; Butchart, Ben

2010-05-01

Fieldwork is an integral part of much geosciences research. But whilst geoscientists have physical or online access to data collections whilst in the laboratory or at base stations, equivalent in-field access is not standard or straightforward. The increasing availability of mobile internet and GPS-supported mobile phones, however, now provides the basis for addressing this issue. The SPACER project was commissioned by the Rapid Innovation initiative of the UK Joint Information Systems Committee (JISC) to explore the potential for GPS-enabled mobile phones to access geoscientific metadata collections. Metadata collections within the geosciences and the wider geospatial domain can be disseminated through web services based on the Catalogue Service for Web(CSW) standard of the Open Geospatial Consortium (OGC) - a global grouping of over 380 private, public and academic organisations aiming to improve interoperability between geospatial technologies. CSW offers an XML-over-HTTP interface for querying and retrieval of geospatial metadata. By default, the metadata returned by CSW is based on the ISO19115 standard and encoded in XML conformant to ISO19139. The SPACER project has created a prototype application that enables mobile phones to send queries to CSW containing user-defined keywords and coordinates acquired from GPS devices built-into the phones. The prototype has been developed using the free and open source Google Android platform. The mobile application offers views for listing titles, presenting multiple metadata elements and a Google Map with an overlay of bounding coordinates of datasets. The presentation will describe the architecture and approach applied in the development of the prototype.
Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals

NASA Astrophysics Data System (ADS)

Zamyadi, A.; Pouliot, J.; Bédard, Y.

2013-09-01

Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial
Metadata Means Communication: The Challenges of Producing Useful Metadata

NASA Astrophysics Data System (ADS)

Edwards, P. N.; Batcheller, A. L.

2010-12-01

Metadata are increasingly perceived as an important component of data sharing systems. For instance, metadata accompanying atmospheric model output may indicate the grid size, grid type, and parameter settings used in the model configuration. We conducted a case study of a data portal in the atmospheric sciences using in-depth interviews, document review, and observation. OUr analysis revealed a number of challenges in producing useful metadata. First, creating and managing metadata required considerable effort and expertise, yet responsibility for these tasks was ill-defined and diffused among many individuals, leading to errors, failure to capture metadata, and uncertainty about the quality of the primary data. Second, metadata ended up stored in many different forms and software tools, making it hard to manage versions and transfer between formats. Third, the exact meanings of metadata categories remained unsettled and misunderstood even among a small community of domain experts -- an effect we expect to be exacerbated when scientists from other disciplines wish to use these data. In practice, we found that metadata problems due to these obstacles are often overcome through informal, personal communication, such as conversations or email. We conclude that metadata serve to communicate the context of data production from the people who produce data to those who wish to use it. Thus while formal metadata systems are often public, critical elements of metadata (those embodied in informal communication) may never be recorded. Therefore, efforts to increase data sharing should include ways to facilitate inter-investigator communication. Instead of tackling metadata challenges only on the formal level, we can improve data usability for broader communities by better supporting metadata communication.
Log-less metadata management on metadata server for parallel file systems.

PubMed

Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.
Log-Less Metadata Management on Metadata Server for Parallel File Systems

PubMed Central

Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093
Metadata for Web Resources: How Metadata Works on the Web.

ERIC Educational Resources Information Center

Dillon, Martin

This paper discusses bibliographic control of knowledge resources on the World Wide Web. The first section sets the context of the inquiry. The second section covers the following topics related to metadata: (1) definitions of metadata, including metadata as tags and as descriptors; (2) metadata on the Web, including general metadata systems,…
Creating preservation metadata from XML-metadata profiles

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

2014-05-01

Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the
Metadata (MD)

Treesearch

Robert E. Keane

2006-01-01

The Metadata (MD) table in the FIREMON database is used to record any information about the sampling strategy or data collected using the FIREMON sampling procedures. The MD method records metadata pertaining to a group of FIREMON plots, such as all plots in a specific FIREMON project. FIREMON plots are linked to metadata using a unique metadata identifier that is...
Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

ERIC Educational Resources Information Center

Southwick, Silvia B.; Lampert, Cory

2011-01-01

This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…
The New Online Metadata Editor for Generating Structured Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Shrestha, B.; Palanisamy, G.; Hook, L.; Killeffer, T.; Boden, T.; Cook, R. B.; Zolly, L.; Hutchison, V.; Frame, M. T.; Cialella, A. T.; Lazer, K.

2014-12-01

Nobody is better suited to "describe" data than the scientist who created it. This "description" about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [1] [2]. Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. How is OME helping Big Data Centers like ORNL DAAC? The ORNL DAAC is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers managed by the ESDIS Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and
Master Metadata Repository and Metadata-Management System

NASA Technical Reports Server (NTRS)

Armstrong, Edward; Reed, Nate; Zhang, Wen

2007-01-01

A master metadata repository (MMR) software system manages the storage and searching of metadata pertaining to data from national and international satellite sources of the Global Ocean Data Assimilation Experiment (GODAE) High Resolution Sea Surface Temperature Pilot Project [GHRSSTPP]. These sources produce a total of hundreds of data files daily, each file classified as one of more than ten data products representing global sea-surface temperatures. The MMR is a relational database wherein the metadata are divided into granulelevel records [denoted file records (FRs)] for individual satellite files and collection-level records [denoted data set descriptions (DSDs)] that describe metadata common to all the files from a specific data product. FRs and DSDs adhere to the NASA Directory Interchange Format (DIF). The FRs and DSDs are contained in separate subdatabases linked by a common field. The MMR is configured in MySQL database software with custom Practical Extraction and Reporting Language (PERL) programs to validate and ingest the metadata records. The database contents are converted into the Federal Geographic Data Committee (FGDC) standard format by use of the Extensible Markup Language (XML). A Web interface enables users to search for availability of data from all sources.
Operational Support for Instrument Stability through ODI-PPA Metadata Visualization and Analysis

NASA Astrophysics Data System (ADS)

Young, M. D.; Hayashi, S.; Gopu, A.; Kotulla, R.; Harbeck, D.; Liu, W.

2015-09-01

Over long time scales, quality assurance metrics taken from calibration and calibrated data products can aid observatory operations in quantifying the performance and stability of the instrument, and identify potential areas of concern or guide troubleshooting and engineering efforts. Such methods traditionally require manual SQL entries, assuming the requisite metadata has even been ingested into a database. With the ODI-PPA system, QA metadata has been harvested and indexed for all data products produced over the life of the instrument. In this paper we will describe how, utilizing the industry standard Highcharts Javascript charting package with a customized AngularJS-driven user interface, we have made the process of visualizing the long-term behavior of these QA metadata simple and easily replicated. Operators can easily craft a custom query using the powerful and flexible ODI-PPA search interface and visualize the associated metadata in a variety of ways. These customized visualizations can be bookmarked, shared, or embedded externally, and will be dynamically updated as new data products enter the system, enabling operators to monitor the long-term health of their instrument with ease.
THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Shrestha, Biva; Palanisamy, Giri

Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, andmore » locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to

Catalogues of planetary nebulae.

NASA Astrophysics Data System (ADS)

Acker, A.

Firstly, the general requirements concerning catalogues are studied for planetary nebulae, in particular concerning the objects to be included in a catalogue of PN, their denominations, followed by reflexions about the afterlife and comuterized versions of a catalogue. Then, the basic elements constituting a catalogue of PN are analyzed, and the available data are looked at each time.
Evolutions in Metadata Quality

NASA Astrophysics Data System (ADS)

Gilman, J.

2016-12-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This talk will cover how we encourage metadata authors to improve the metadata through the use of integrated rubrics of metadata quality and outreach efforts. In addition we'll demonstrate Humanizers, a technique for dealing with the symptoms of metadata issues. Humanizers allow CMR administrators to identify specific metadata issues that are fixed at runtime when the data is indexed. An example Humanizer is the aliasing of processing level "Level 1" to "1" to improve consistency across collections. The CMR currently indexes 35K collections and 300M granules.
CMR Metadata Curation

NASA Technical Reports Server (NTRS)

Shum, Dana; Bugbee, Kaylin

2017-01-01

This talk explains the ongoing metadata curation activities in the Common Metadata Repository. It explores tools that exist today which are useful for building quality metadata and also opens up the floor for discussions on other potentially useful tools.
Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

NASA Technical Reports Server (NTRS)

Gilman, Jason; Shum, Dana; Baynes, Katie

2016-01-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.
Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.

A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata requestmore » can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.« less
Harvesting NASA's Common Metadata Repository

NASA Astrophysics Data System (ADS)

Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

2017-12-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
BASINS Metadata

EPA Pesticide Factsheets

Metadata or data about data describes the content, quality, condition, and other characteristics of data. Geospatial metadata are critical to data discovery and serves as the fuel for the Geospatial One-Stop data portal.
A Common Metadata System for Marine Data Portals

NASA Astrophysics Data System (ADS)

Wosniok, C.; Breitbach, G.; Lehfeldt, R.

2012-04-01

), Web Feature Service (WFS) and Sensor Observation Service (SOS), which ensures interoperability and extensibility. In addition, metadata as crucial components for searching and finding information in large data infrastructures is provided via the Catalogue Web Service (CS-W). MDI-DE and COSYNA rely on the metadata information system for marine metadata NOKIS, which reflects a metadata profile tailored for marine data according to the specifications of German coastal authorities. In spite of this common software base, interoperability between the two data collections requires constant alignments of the diverse data processed by the two portals. While monitoring data in the MDI-DE is currently rather campaign-based, COSYNA has to fit constantly evolving time series into metadata sets. With all data following the same metadata profile, we now reach full interoperability between the different data collections. The distributed marine information system provides options to search, find and visualise the harmonised results from continuous monitoring, field campaigns, numerical modeling and other data in one web client.
USGIN ISO metadata profile

NASA Astrophysics Data System (ADS)

Richard, S. M.

2011-12-01

The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata
Building Format-Agnostic Metadata Repositories

NASA Astrophysics Data System (ADS)

Cechini, M.; Pilone, D.

2010-12-01

This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on
Users and Union Catalogues

ERIC Educational Resources Information Center

Hartley, R. J.; Booth, Helen

2006-01-01

Union catalogues have had an important place in libraries for many years. Their use has been little investigated. Recent interest in the relative merits of physical and virtual union catalogues and a recent collaborative project between a physical and several virtual union catalogues in the United Kingdom led to the opportunity to study how users…
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.

PubMed

Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten

2015-01-01

The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
Harvesting NASA's Common Metadata Repository (CMR)

NASA Technical Reports Server (NTRS)

Shum, Dana; Durbin, Chris; Norton, James; Mitchell, Andrew

2017-01-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

ERIC Educational Resources Information Center

Mayernik, Matthew Stephen

2011-01-01

As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…
Automated Metadata Extraction

DTIC Science & Technology

2008-06-01

provides a means for file owners to add metadata which can then be used by iTunes for cataloging and searching [4]. Metadata can be stored in different...based and contain AAC data formats [3]. Specifically, Apple uses Protected AAC to encode copy-protected music titles purchased from the iTunes Music...Store [4]. The files purchased from the iTunes Music Store include the following metadata. • Name • Email address of purchaser • Year • Album
TOPCAT: Tool for OPerations on Catalogues And Tables

NASA Astrophysics Data System (ADS)

Taylor, Mark

2011-01-01

TOPCAT is an interactive graphical viewer and editor for tabular data. Its aim is to provide most of the facilities that astronomers need for analysis and manipulation of source catalogues and other tables, though it can be used for non-astronomical data as well. It understands a number of different astronomically important formats (including FITS and VOTable) and more formats can be added. It offers a variety of ways to view and analyse tables, including a browser for the cell data themselves, viewers for information about table and column metadata, and facilities for 1-, 2-, 3- and higher-dimensional visualisation, calculating statistics and joining tables using flexible matching algorithms. Using a powerful and extensible Java-based expression language new columns can be defined and row subsets selected for separate analysis. Table data and metadata can be edited and the resulting modified table can be written out in a wide range of output formats. It is a stand-alone application which works quite happily with no network connection. However, because it uses Virtual Observatory (VO) standards, it can cooperate smoothly with other tools in the VO world and beyond, such as VODesktop, Aladin and ds9. Between 2006 and 2009 TOPCAT was developed within the AstroGrid project, and is offered as part of a standard suite of applications on the AstroGrid web site, where you can find information on several other VO tools. The program is written in pure Java and available under the GNU General Public Licence. It has been developed in the UK within the Starlink and AstroGrid projects, and under PPARC and STFC grants. Its underlying table processing facilities are provided by STIL.
Metazen - metadata capture for metagenomes.

PubMed

Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

2014-01-01

As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.
EOS ODL Metadata On-line Viewer

NASA Astrophysics Data System (ADS)

Yang, J.; Rabi, M.; Bane, B.; Ullman, R.

2002-12-01

We have recently developed and deployed an EOS ODL metadata on-line viewer. The EOS ODL metadata viewer is a web server that takes: 1) an EOS metadata file in Object Description Language (ODL), 2) parameters, such as which metadata to view and what style of display to use, and returns an HTML or XML document displaying the requested metadata in the requested style. This tool is developed to address widespread complaints by science community that the EOS Data and Information System (EOSDIS) metadata files in ODL are difficult to read by allowing users to upload and view an ODL metadata file in different styles using a web browser. Users have the selection to view all the metadata or part of the metadata, such as Collection metadata, Granule metadata, or Unsupported Metadata. Choices of display styles include 1) Web: a mouseable display with tabs and turn-down menus, 2) Outline: Formatted and colored text, suitable for printing, 3) Generic: Simple indented text, a direct representation of the underlying ODL metadata, and 4) None: No stylesheet is applied and the XML generated by the converter is returned directly. Not all display styles are implemented for all the metadata choices. For example, Web style is only implemented for Collection and Granule metadata groups with known attribute fields, but not for Unsupported, Other, and All metadata. The overall strategy of the ODL viewer is to transform an ODL metadata file to a viewable HTML in two steps. The first step is to convert the ODL metadata file to an XML using a Java-based parser/translator called ODL2XML. The second step is to transform the XML to an HTML using stylesheets. Both operations are done on the server side. This allows a lot of flexibility in the final result, and is very portable cross-platform. Perl CGI behind the Apache web server is used to run the Java ODL2XML, and then run the results through an XSLT processor. The EOS ODL viewer can be accessed from either a PC or a Mac using Internet
Visualization of JPEG Metadata

NASA Astrophysics Data System (ADS)

Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.
Mercury Toolset for Spatiotemporal Metadata

NASA Technical Reports Server (NTRS)

Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

2010-01-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.

Mercury Toolset for Spatiotemporal Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

2010-06-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog

NASA Astrophysics Data System (ADS)

Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.

2016-10-01

Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
MCM generator: a Java-based tool for generating medical metadata.

PubMed

Munoz, F; Hersh, W

1998-01-01

In a previous paper we introduced the need to implement a mechanism to facilitate the discovery of relevant Web medical documents. We maintained that the use of META tags, specifically ones that define the medical subject and resource type of a document, help towards this goal. We have now developed a tool to facilitate the generation of these tags for the authors of medical documents. Written entirely in Java, this tool makes use of the SAPHIRE server, and helps the author identify the Medical Subject Heading terms that most appropriately describe the subject of the document. Furthermore, it allows the author to generate metadata tags for the 15 elements that the Dublin Core considers as core elements in the description of a document. This paper describes the use of this tool in the cataloguing of Web and non-Web medical documents, such as images, movie, and sound files.
Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

USGS Publications Warehouse

Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

2018-04-12

The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses
Creating context for the experiment record. User-defined metadata: investigations into metadata usage in the LabTrove ELN.

PubMed

Willoughby, Cerys; Bird, Colin L; Coles, Simon J; Frey, Jeremy G

2014-12-22

The drive toward more transparency in research, the growing willingness to make data openly available, and the reuse of data to maximize the return on research investment all increase the importance of being able to find information and make links to the underlying data. The use of metadata in Electronic Laboratory Notebooks (ELNs) to curate experiment data is an essential ingredient for facilitating discovery. The University of Southampton has developed a Web browser-based ELN that enables users to add their own metadata to notebook entries. A survey of these notebooks was completed to assess user behavior and patterns of metadata usage within ELNs, while user perceptions and expectations were gathered through interviews and user-testing activities within the community. The findings indicate that while some groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users are making little attempts to use it, thereby endangering their ability to recover data in the future. A survey of patterns of metadata use in these notebooks, together with feedback from the user community, indicated that while a few groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users adopt a "minimum required" approach to metadata. To investigate whether the patterns of metadata use in LabTrove were unusual, a series of surveys were undertaken to investigate metadata usage in a variety of platforms supporting user-defined metadata. These surveys also provided the opportunity to investigate whether interface designs in these other environments might inform strategies for encouraging metadata creation and more effective use of metadata in LabTrove.
Viewing and Editing Earth Science Metadata MOBE: Metadata Object Browser and Editor in Java

NASA Astrophysics Data System (ADS)

Chase, A.; Helly, J.

2002-12-01

Metadata is an important, yet often neglected aspect of successful archival efforts. However, to generate robust, useful metadata is often a time consuming and tedious task. We have been approaching this problem from two directions: first by automating metadata creation, pulling from known sources of data, and in addition, what this (paper/poster?) details, developing friendly software for human interaction with the metadata. MOBE and COBE(Metadata Object Browser and Editor, and Canonical Object Browser and Editor respectively), are Java applications for editing and viewing metadata and digital objects. MOBE has already been designed and deployed, currently being integrated into other areas of the SIOExplorer project. COBE is in the design and development stage, being created with the same considerations in mind as those for MOBE. Metadata creation, viewing, data object creation, and data object viewing, when taken on a small scale are all relatively simple tasks. Computer science however, has an infamous reputation for transforming the simple into complex. As a system scales upwards to become more robust, new features arise and additional functionality is added to the software being written to manage the system. The software that emerges from such an evolution, though powerful, is often complex and difficult to use. With MOBE the focus is on a tool that does a small number of tasks very well. The result has been an application that enables users to manipulate metadata in an intuitive and effective way. This allows for a tool that serves its purpose without introducing additional cognitive load onto the user, an end goal we continue to pursue.
Partnerships To Mine Unexploited Sources of Metadata.

ERIC Educational Resources Information Center

Reynolds, Regina Romano

This paper discusses the metadata created for other purposes as a potential source of bibliographic data. The first section addresses collecting metadata by means of templates, including the Nordic Metadata Project's Dublin Core Metadata Template. The second section considers potential partnerships for re-purposing metadata for bibliographic use,…
Cataloguing Standards; The Report of the Canadian Task Group on Cataloguing Standards.

ERIC Educational Resources Information Center

National Library of Canada, Ottawa (Ontario).

Following the recommendations of the National Conference on Cataloguing Standards held at the National Library of Canada in May 1970, a Canadian Task Group on Cataloguing Standards was set up to study and identify present deficiencies in the organizing and processing of Canadian material, and the cataloging problems of Canadian libraries, and to…
Evaluating and Evolving Metadata in Multiple Dialects

NASA Astrophysics Data System (ADS)

Kozimor, J.; Habermann, T.; Powers, L. A.; Gordon, S.

2016-12-01

Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways. This situation has led to the development of metadata repositories that can ingest and output metadata in multiple dialects. As an operational example, the NASA Common Metadata Repository (CMR) includes three different metadata dialects (DIF, ECHO, and ISO 19115-2). These systems raise a new question for metadata providers: if I have a choice of metadata dialects, which should I use and how do I make that decision? We have developed a collection of metadata evaluation tools that can be used to evaluate metadata records in many dialects for completeness with respect to recommendations from many organizations and communities. We have applied these tools to over 8000 collection and granule metadata records in four different dialects. This large collection of identical content in multiple dialects enables us to address questions about metadata and dialect evolution and to answer those questions quantitatively. We will describe those tools and results from evaluating the NASA CMR metadata collection.
Metazen – metadata capture for metagenomes

DOE PAGES

Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...

2014-12-08

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Metazen – metadata capture for metagenomes

PubMed Central

2014-01-01

Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508
Metazen – metadata capture for metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, Jared; Harrison, Travis; Paczian, Tobias

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
The AC 2000.2 Catalogue

NASA Astrophysics Data System (ADS)

Urban, S. E.; Corbin, T. E.; Wycoff, G. L.; Makarov, V. V.; Høg, E.; Fabricius, C.

2001-12-01

For over 100 years, the international project known as the Astrographic Catalogue -- which involved 20 observatories tasked to photograph the sky -- has held an unfulfilled promised of yielding a wealth of astrometric information. This promise was not realized due to the inadequate reductions of the project's plates. However, in 1997 the U.S. Naval Observatory (USNO) completed the reductions of the 22,660 plates. That catalogue, named the AC 2000, contained positions and magnitudes for 4.6 million stars down to about v magnitude 12.5. Due to the early epochs of the data -- averaging 1907 -- and the positional accuracies -- between 150 and 400 milliarcseconds -- the data are extremely valuable in computing proper motions. In 1997, these positions were used to form the proper motions of the ACT Reference Catalogue. In 1999, USNO and Copenhagen University Observatory (CUO) partnered to create the Tycho-2 Catalogue. The CUO group re-analyzed the data from the Tycho experiment on the Hipparcos satellite. The USNO group re-analyzed over 140 positional catalogs which were combined with the expanded Tycho positions from the CUO group to compute the Tycho-2 proper motions. The largest contributor to these proper motions was the re-analyzed Astrographic Catalogue; the latest version being known as the AC 2000.2 Catalogue. There are two major differences between the AC 2000 and the AC 2000.2. First, the reference catalog used in AC 2000.2 was an expanded version of the Astrographic Catalogue Reference Stars that was rigorously derived on the Hipparcos Celestial Reference Frame. The second is that AC 2000.2 contains photometry from Tycho-2, where available. A description of the AC 2000.2 Catalogue, the reduction techniques used, how it compares with the 1997 version, and information on obtaining the data will be presented.
Catalogue of Icelandic Volcanoes

NASA Astrophysics Data System (ADS)

Ilyinskaya, Evgenia; Larsen, Gudrun; Gudmundsson, Magnus T.; Vogfjord, Kristin; Pagneux, Emmanuel; Oddsson, Bjorn; Barsotti, Sara; Karlsdottir, Sigrun

2016-04-01

The Catalogue of Icelandic Volcanoes is a newly developed open-access web resource in English intended to serve as an official source of information about active volcanoes in Iceland and their characteristics. The Catalogue forms a part of an integrated volcanic risk assessment project in Iceland GOSVÁ (commenced in 2012), as well as being part of the effort of FUTUREVOLC (2012-2016) on establishing an Icelandic volcano supersite. Volcanic activity in Iceland occurs on volcanic systems that usually comprise a central volcano and fissure swarm. Over 30 systems have been active during the Holocene (the time since the end of the last glaciation - approximately the last 11,500 years). In the last 50 years, over 20 eruptions have occurred in Iceland displaying very varied activity in terms of eruption styles, eruptive environments, eruptive products and the distribution lava and tephra. Although basaltic eruptions are most common, the majority of eruptions are explosive, not the least due to magma-water interaction in ice-covered volcanoes. Extensive research has taken place on Icelandic volcanism, and the results reported in numerous scientific papers and other publications. In 2010, the International Civil Aviation Organisation (ICAO) funded a 3 year project to collate the current state of knowledge and create a comprehensive catalogue readily available to decision makers, stakeholders and the general public. The work on the Catalogue began in 2011, and was then further supported by the Icelandic government and the EU through the FP7 project FUTUREVOLC. The Catalogue of Icelandic Volcanoes is a collaboration of the Icelandic Meteorological Office (the state volcano observatory), the Institute of Earth Sciences at the University of Iceland, and the Civil Protection Department of the National Commissioner of the Iceland Police, with contributions from a large number of specialists in Iceland and elsewhere. The Catalogue is built up of chapters with texts and various
Astronomical Catalogues - Definition Elements and Afterlife

NASA Astrophysics Data System (ADS)

Jaschek, C.

1984-09-01

Based on a look at the different meanings of the term catalogue (or catalog), a definition is proposed. In an analysis of the main elements, a number of requirements that catalogues should satisfy are pointed out. A section is devoted to problems connected with computer-readable versions of printed catalogues.
A Solution to Metadata: Using XML Transformations to Automate Metadata

DTIC Science & Technology

2010-06-01

developed their own metadata standards—Directory Interchange Format (DIF), Ecological Metadata Language ( EML ), and International Organization for...mented all their data using the EML standard. However, when later attempting to publish to a data clearinghouse— such as the Geospatial One-Stop (GOS...construct calls to its transform(s) method by providing the type of the incoming content (e.g., eml ), the type of the resulting content (e.g., fgdc) and
Descriptive Metadata: Emerging Standards.

ERIC Educational Resources Information Center

Ahronheim, Judith R.

1998-01-01

Discusses metadata, digital resources, cross-disciplinary activity, and standards. Highlights include Standard Generalized Markup Language (SGML); Extensible Markup Language (XML); Dublin Core; Resource Description Framework (RDF); Text Encoding Initiative (TEI); Encoded Archival Description (EAD); art and cultural-heritage metadata initiatives;…
Making Metadata Better with CMR and MMT

NASA Technical Reports Server (NTRS)

Gilman, Jason Arthur; Shum, Dana

2016-01-01

Ensuring complete, consistent and high quality metadata is a challenge for metadata providers and curators. The CMR and MMT systems provide providers and curators options to build in metadata quality from the start and also assess and improve the quality of already existing metadata.
Enriched Video Semantic Metadata: Authorization, Integration, and Presentation.

ERIC Educational Resources Information Center

Mu, Xiangming; Marchionini, Gary

2003-01-01

Presents an enriched video metadata framework including video authorization using the Video Annotation and Summarization Tool (VAST)-a video metadata authorization system that integrates both semantic and visual metadata-- metadata integration, and user level applications. Results demonstrated that the enriched metadata were seamlessly…
Data, Metadata - Who Cares?

NASA Astrophysics Data System (ADS)

Baumann, Peter

2013-04-01

There is a traditional saying that metadata are understandable, semantic-rich, and searchable. Data, on the other hand, are big, with no accessible semantics, and just downloadable. Not only has this led to an imbalance of search support form a user perspective, but also underneath to a deep technology divide often using relational databases for metadata and bespoke archive solutions for data. Our vision is that this barrier will be overcome, and data and metadata become searchable likewise, leveraging the potential of semantic technologies in combination with scalability technologies. Ultimately, in this vision ad-hoc processing and filtering will not distinguish any longer, forming a uniformly accessible data universe. In the European EarthServer initiative, we work towards this vision by federating database-style raster query languages with metadata search and geo broker technology. We present our approach taken, how it can leverage OGC standards, the benefits envisaged, and first results.

METADATA REGISTRY, ISO/IEC 11179

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pon, R K; Buttler, D J

2008-01-03

ISO/IEC-11179 is an international standard that documents the standardization and registration of metadata to make data understandable and shareable. This standardization and registration allows for easier locating, retrieving, and transmitting data from disparate databases. The standard defines the how metadata are conceptually modeled and how they are shared among parties, but does not define how data is physically represented as bits and bytes. The standard consists of six parts. Part 1 provides a high-level overview of the standard and defines the basic element of a metadata registry - a data element. Part 2 defines the procedures for registering classification schemesmore » and classifying administered items in a metadata registry (MDR). Part 3 specifies the structure of an MDR. Part 4 specifies requirements and recommendations for constructing definitions for data and metadata. Part 5 defines how administered items are named and identified. Part 6 defines how administered items are registered and assigned an identifier.« less
Do Community Recommendations Improve Metadata?

NASA Astrophysics Data System (ADS)

Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Mecum, B.; Powers, L. A.; Slaughter, P.

2016-12-01

Complete documentation of scientific data is the surest way to facilitate discovery and reuse. What is complete metadata? There are many metadata recommendations from communities like the OGC, FGDC, NASA, and LTER, that can provide data documentation guidance for discovery, access, use and understanding. Often, the recommendations that communities develop are for a particular metadata dialect. Two examples of this are the LTER Completeness recommendation for EML and the FGDC Data Discovery recommendation for CSDGM. Can community adoption of a recommendation ensure that what is included in the metadata is understandable to the scientific community and beyond? By applying quantitative analysis to different LTER and USGS metadata collections in DataOne and ScienceBase, we show that community recommendations can improve the completeness of collections over time. Additionally, by comparing communities in DataOne that use the EML and CSDGM dialects, but have not adopted the recommendations to the communities that have, the positive effects of recommendation adoption on documentation completeness can be measured.
Evolution of the architecture of the ATLAS Metadata Interface (AMI)

NASA Astrophysics Data System (ADS)

Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.

2015-12-01

The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.
Managing Complex Change in Clinical Study Metadata

PubMed Central

Brandt, Cynthia A.; Gadagkar, Rohit; Rodriguez, Cesar; Nadkarni, Prakash M.

2004-01-01

In highly functional metadata-driven software, the interrelationships within the metadata become complex, and maintenance becomes challenging. We describe an approach to metadata management that uses a knowledge-base subschema to store centralized information about metadata dependencies and use cases involving specific types of metadata modification. Our system borrows ideas from production-rule systems in that some of this information is a high-level specification that is interpreted and executed dynamically by a middleware engine. Our approach is implemented in TrialDB, a generic clinical study data management system. We review approaches that have been used for metadata management in other contexts and describe the features, capabilities, and limitations of our system. PMID:15187070
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

PubMed

Martínez-Romero, Marcos; O'Connor, Martin J; Shankar, Ravi D; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L; Gevaert, Olivier; Graybeal, John; Musen, Mark A

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations

PubMed Central

Martínez-Romero, Marcos; O’Connor, Martin J.; Shankar, Ravi D.; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L.; Gevaert, Olivier; Graybeal, John; Musen, Mark A.

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository. PMID:29854196
Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

ERIC Educational Resources Information Center

Yang, Le

2016-01-01

This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…
QualityML: a dictionary for quality metadata encoding

NASA Astrophysics Data System (ADS)

Ninyerola, Miquel; Sevillano, Eva; Serral, Ivette; Pons, Xavier; Zabala, Alaitz; Bastin, Lucy; Masó, Joan

2014-05-01

The scenario of rapidly growing geodata catalogues requires tools focused on facilitate users the choice of products. Having quality fields populated in metadata allow the users to rank and then select the best fit-for-purpose products. In this direction, we have developed the QualityML (http://qualityml.geoviqua.org), a dictionary that contains hierarchically structured concepts to precisely define and relate quality levels: from quality classes to quality measurements. Generically, a quality element is the path that goes from the higher level (quality class) to the lowest levels (statistics or quality metrics). This path is used to encode quality of datasets in the corresponding metadata schemas. The benefits of having encoded quality, in the case of data producers, are related with improvements in their product discovery and better transmission of their characteristics. In the case of data users, particularly decision-makers, they would find quality and uncertainty measures to take the best decisions as well as perform dataset intercomparison. Also it allows other components (such as visualization, discovery, or comparison tools) to be quality-aware and interoperable. On one hand, the QualityML is a profile of the ISO geospatial metadata standards providing a set of rules for precisely documenting quality indicator parameters that is structured in 6 levels. On the other hand, QualityML includes semantics and vocabularies for the quality concepts. Whenever possible, if uses statistic expressions from the UncertML dictionary (http://www.uncertml.org) encoding. However it also extends UncertML to provide list of alternative metrics that are commonly used to quantify quality. A specific example, based on a temperature dataset, is shown below. The annual mean temperature map has been validated with independent in-situ measurements to obtain a global error of 0.5 ° C. Level 0: Quality class (e.g., Thematic accuracy) Level 1: Quality indicator (e.g., Quantitative
Extending the ISC-GEM Global Earthquake Instrumental Catalogue

NASA Astrophysics Data System (ADS)

Di Giacomo, Domenico; Engdhal, Bob; Storchak, Dmitry; Villaseñor, Antonio; Harris, James

2015-04-01

After a 27-month project funded by the GEM Foundation (www.globalquakemodel.org), in January 2013 we released the ISC-GEM Global Instrumental Earthquake Catalogue (1900 2009) (www.isc.ac.uk/iscgem/index.php) as a special product to use for seismic hazard studies. The new catalogue was necessary as improved seismic hazard studies necessitate that earthquake catalogues are homogeneous (to the largest extent possible) over time in their fundamental parameters, such as location and magnitude. Due to time and resource limitation, the ISC-GEM catalogue (1900-2009) included earthquakes selected according to the following time-variable cut-off magnitudes: Ms=7.5 for earthquakes occurring before 1918; Ms=6.25 between 1918 and 1963; and Ms=5.5 from 1964 onwards. Because of the importance of having a reliable seismic input for seismic hazard studies, funding from GEM and two commercial companies in the US and UK allowed us to start working on the extension of the ISC-GEM catalogue both for earthquakes that occurred beyond 2009 and for earthquakes listed in the International Seismological Summary (ISS) which fell below the cut-off magnitude of 6.25. This extension is part of a four-year program that aims at including in the ISC-GEM catalogue large global earthquakes that occurred before the beginning of the ISC Bulletin in 1964. In this contribution we present the updated ISC GEM catalogue, which will include over 1000 more earthquakes that occurred in 2010 2011 and several hundreds more between 1950 and 1959. The catalogue extension between 1935 and 1949 is currently underway. The extension of the ISC-GEM catalogue will also be helpful for regional cross border seismic hazard studies as the ISC-GEM catalogue should be used as basis for cross-checking the consistency in location and magnitude of those earthquakes listed both in the ISC GEM global catalogue and regional catalogues.
Science friction: data, metadata, and collaboration.

PubMed

Edwards, Paul N; Mayernik, Matthew S; Batcheller, Archer L; Bowker, Geoffrey C; Borgman, Christine L

2011-10-01

When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.
James Dunlop's historical catalogue of southern nebulae and clusters

NASA Astrophysics Data System (ADS)

Cozens, Glen; Walsh, Andrew; Orchiston, Wayne

2010-03-01

In 1826 James Dunlop compiled the second ever catalogue of southern star clusters, nebulae and galaxies from Parramatta (NSW, Australia) using a 23-cm reflecting telescope. Initially acclaimed, the catalogue and author were later criticised and condemned by others - including Sir John Herschel and both the catalogue and author are now largely unknown. The criticism of the catalogue centred on the large number of fictitious or ‘missing’ objects, yet detailed analysis reveals the remarkable completeness of the catalogue, despite its inherent errors. We believe that James Dunlop was an important early Australian astronomer, and his catalogue should be esteemed as the southern equivalent of Messier's famous northern catalogue.
CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

NASA Astrophysics Data System (ADS)

Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

2011-12-01

JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web
Metadata in the Wild: An Empirical Survey of OPeNDAP-accessible Metadata and its Implications for Discovery

NASA Astrophysics Data System (ADS)

Hardy, D.; Janée, G.; Gallagher, J.; Frew, J.; Cornillon, P.

2006-12-01

The OPeNDAP Data Access Protocol (DAP) is a community standard for sharing scientific data across the Internet. Data providers using DAP have adopted a variety of metadata conventions to improve data utility, such as COARDS (1995) and CF (2003). Our results show, however, that metadata do not follow these conventions in practice. We collected metadata from over a hundred DAP servers, tens of thousands of data objects, and hundreds of collections. We found that a minority claim to adhere to a metadata convention, and a small percentage accurately adhere to their stated convention. We present descriptive statistics of our survey and highlight common traits such as well-populated attributes. Our empirical results indicate that unified search services cannot rely solely on metadata conventions. Although we encourage all providers to adopt a small subset of the CF convention for discovery purposes, we have no evidence to suggest that improved conventions would simplify the fundamental problem of heterogeneity. Large-scale discovery services must find methods for integrating incompatible metadata.
CATPAC -- Catalogue Applications Package on UNIX

NASA Astrophysics Data System (ADS)

Wood, A. R.

CATPAC is the STARLINK Catalogue and Table Package. This document describes the CATPAC applications available on UNIX. These include applications for inputing, processing and reporting tabular data including astronomical catalogues.
Metadata for WIS and WIGOS: GAW Profile of ISO19115 and Draft WIGOS Core Metadata Standard

NASA Astrophysics Data System (ADS)

Klausen, Jörg; Howe, Brian

2014-05-01

The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority to underpin all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). The development of the WIGOS Operational Information Resource (WIR) is central to the WIGOS Framework Implementation Plan (WIGOS-IP). The WIR shall provide information on WIGOS and its observing components, as well as requirements of WMO application areas. An important aspect is the description of the observational capabilities by way of structured metadata. The Global Atmosphere Watch is the WMO program addressing the chemical composition and selected physical properties of the atmosphere. Observational data are collected and archived by GAW World Data Centres (WDCs) and related data centres. The Task Team on GAW WDCs (ET-WDC) have developed a profile of the ISO19115 metadata standard that is compliant with the WMO Information System (WIS) specification for the WMO Core Metadata Profile v1.3. This profile is intended to harmonize certain aspects of the documentation of observations as well as the interoperability of the WDCs. The Inter-Commission-Group on WIGOS (ICG-WIGOS) has established the Task Team on WIGOS Metadata (TT-WMD) with representation of all WMO Technical Commissions and the objective to define the WIGOS Core Metadata. The result of this effort is a draft semantic standard comprising of a set of metadata classes that are considered to be of critical importance for the interpretation of observations relevant to WIGOS. The purpose of the presentation is to acquaint the audience with the standard and to solicit informal feed-back from experts in the various disciplines of meteorology and climatology. This feed-back will help ET-WDC and TT-WMD to refine the GAW metadata profile and the draft WIGOS metadata standard, thereby increasing their utility and acceptance.
GraphMeta: Managing HPC Rich Metadata in Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Chen, Yong; Carns, Philip

High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
Mitogenome metadata: current trends and proposed standards.

PubMed

Strohm, Jeff H T; Gwiazdowski, Rodger A; Hanner, Robert

2016-09-01

Mitogenome metadata are descriptive terms about the sequence, and its specimen description that allow both to be digitally discoverable and interoperable. Here, we review a sampling of mitogenome metadata published in the journal Mitochondrial DNA between 2005 and 2014. Specifically, we have focused on a subset of metadata fields that are available for GenBank records, and specified by the Genomics Standards Consortium (GSC) and other biodiversity metadata standards; and we assessed their presence across three main categories: collection, biological and taxonomic information. To do this we reviewed 146 mitogenome manuscripts, and their associated GenBank records, and scored them for 13 metadata fields. We also explored the potential for mitogenome misidentification using their sequence diversity, and taxonomic metadata on the Barcode of Life Datasystems (BOLD). For this, we focused on all Lepidoptera and Perciformes mitogenomes included in the review, along with additional mitogenome sequence data mined from Genbank. Overall, we found that none of 146 mitogenome projects provided all the metadata we looked for; and only 17 projects provided at least one category of metadata across the three main categories. Comparisons using mtDNA sequences from BOLD, suggest that some mitogenomes may be misidentified. Lastly, we appreciate the research potential of mitogenomes announced through this journal; and we conclude with a suggestion of 13 metadata fields, available on GenBank, that if provided in a mitogenomes's GenBank record, would increase their research value.
Metadata Sets for e-Government Resources: The Extended e-Government Metadata Schema (eGMS+)

NASA Astrophysics Data System (ADS)

Charalabidis, Yannis; Lampathaki, Fenareti; Askounis, Dimitris

In the dawn of the Semantic Web era, metadata appear as a key enabler that assists management of the e-Government resources related to the provision of personalized, efficient and proactive services oriented towards the real citizens’ needs. As different authorities typically use different terms to describe their resources and publish them in various e-Government Registries that may enhance the access to and delivery of governmental knowledge, but also need to communicate seamlessly at a national and pan-European level, the need for a unified e-Government metadata standard emerges. This paper presents the creation of an ontology-based extended metadata set for e-Government Resources that embraces services, documents, XML Schemas, code lists, public bodies and information systems. Such a metadata set formalizes the exchange of information between portals and registries and assists the service transformation and simplification efforts, while it can be further taken into consideration when applying Web 2.0 techniques in e-Government.
MPEG-7: standard metadata for multimedia content

NASA Astrophysics Data System (ADS)

Chang, Wo

2005-08-01

The eXtensible Markup Language (XML) metadata technology of describing media contents has emerged as a dominant mode of making media searchable both for human and machine consumptions. To realize this premise, many online Web applications are pushing this concept to its fullest potential. However, a good metadata model does require a robust standardization effort so that the metadata content and its structure can reach its maximum usage between various applications. An effective media content description technology should also use standard metadata structures especially when dealing with various multimedia contents. A new metadata technology called MPEG-7 content description has merged from the ISO MPEG standards body with the charter of defining standard metadata to describe audiovisual content. This paper will give an overview of MPEG-7 technology and what impact it can bring forth to the next generation of multimedia indexing and retrieval applications.
Guidelines for Cataloguing-in-Publication.

ERIC Educational Resources Information Center

Anderson, Dorothy, Comp.

The guidelines provide the criteria for the design of a national cataloguing-in-publication (CIP) program which will both be a component part of the international CIP network and fit the requirements of a specific library and publishing environment. Cataloguing-in-publication is defined, its development is traced, and current CIP programs in nine…

Towards Data Value-Level Metadata for Clinical Studies.

PubMed

Zozus, Meredith Nahm; Bonner, Joseph

2017-01-01

While several standards for metadata describing clinical studies exist, comprehensive metadata to support traceability of data from clinical studies has not been articulated. We examine uses of metadata in clinical studies. We examine and enumerate seven sources of data value-level metadata in clinical studies inclusive of research designs across the spectrum of the National Institutes of Health definition of clinical research. The sources of metadata inform categorization in terms of metadata describing the origin of a data value, the definition of a data value, and operations to which the data value was subjected. The latter is further categorized into information about changes to a data value, movement of a data value, retrieval of a data value, and data quality checks, constraints or assessments to which the data value was subjected. The implications of tracking and managing data value-level metadata are explored.
A metadata template for ocean acidification data

NASA Astrophysics Data System (ADS)

Jiang, L.

2014-12-01

Metadata is structured information that describes, explains, and locates an information resource (e.g., data). It is often coarsely described as data about data, and documents information such as what was measured, by whom, when, where, and how it was sampled, analyzed, with what instruments. Metadata is inherent to ensure the survivability and accessibility of the data into the future. With the rapid expansion of biological response ocean acidification (OA) studies, the lack of a common metadata template to document such type of data has become a significant gap for ocean acidification data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses of organisms on ocean acidification. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as principal investigators, temporal and spatial coverage, platforms for the sampling, data citation are essential components to complete the template. We explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers. For that reason, this paper can serve as a user's manual for the template.
Metadata Wizard: an easy-to-use tool for creating FGDC-CSDGM metadata for geospatial datasets in ESRI ArcGIS Desktop

USGS Publications Warehouse

Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.

2014-01-01

Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).
Inheritance rules for Hierarchical Metadata Based on ISO 19115

NASA Astrophysics Data System (ADS)

Zabala, A.; Masó, J.; Pons, X.

2012-04-01

Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata
Alcohol promotions in Australian supermarket catalogues.

PubMed

Johnston, Robyn; Stafford, Julia; Pierce, Hannah; Daube, Mike

2017-07-01

In Australia, most alcohol is sold as packaged liquor from off-premises retailers, a market increasingly dominated by supermarket chains. Competition between retailers may encourage marketing approaches, for example, discounting, that evidence indicates contribute to alcohol-related harms. This research documented the nature and variety of promotional methods used by two major supermarket retailers to promote alcohol products in their supermarket catalogues. Weekly catalogues from the two largest Australian supermarket chains were reviewed for alcohol-related content over 12 months. Alcohol promotions were assessed for promotion type, product type, number of standard drinks, purchase price and price/standard drink. Each store catalogue included, on average, 13 alcohol promotions/week, with price-based promotions most common. Forty-five percent of promotions required the purchase of multiple alcohol items. Wine was the most frequently promoted product (44%), followed by beer (24%) and spirits (18%). Most (99%) wine cask (2-5 L container) promotions required multiple (two to three) casks to be purchased. The average number of standard drinks required to be purchased to participate in catalogue promotions was 31.7 (SD = 24.9; median = 23.1). The median price per standard drink was $1.49 (range $0.19-$9.81). Cask wines had the lowest cost per standard drink across all product types. Supermarket catalogues' emphasis on low prices/high volumes of alcohol reflects that retailers are taking advantage of limited restrictions on off-premise sales and promotion, which allow them to approach market competition in ways that may increase alcohol-related harms in consumers. Regulation of alcohol marketing should address retailer catalogue promotions. [Johnston R, Stafford J, Pierce H, Daube M. Alcohol promotions in Australian supermarket catalogues. Drug Alcohol Rev 2017;36:456-463]. © 2016 Australasian Professional Society on Alcohol and other Drugs.
FRBCAT: The Fast Radio Burst Catalogue

NASA Astrophysics Data System (ADS)

Petroff, E.; Barr, E. D.; Jameson, A.; Keane, E. F.; Bailes, M.; Kramer, M.; Morello, V.; Tabbara, D.; van Straten, W.

2016-09-01

Here, we present a catalogue of known Fast Radio Burst sources in the form of an online catalogue, FRBCAT. The catalogue includes information about the instrumentation used for the observations for each detected burst, the measured quantities from each observation, and model-dependent quantities derived from observed quantities. To aid in consistent comparisons of burst properties such as width and signal-to-noise ratios, we have re-processed all the bursts for which we have access to the raw data, with software which we make available. The originally derived properties are also listed for comparison. The catalogue is hosted online as a Mysql database which can also be downloaded in tabular or plain text format for off-line use. This database will be maintained for use by the community for studies of the Fast Radio Burst population as it grows.
Incorporating ISO Metadata Using HDF Product Designer

NASA Technical Reports Server (NTRS)

Jelenak, Aleksandar; Kozimor, John; Habermann, Ted

2016-01-01

The need to store in HDF5 files increasing amounts of metadata of various complexity is greatly overcoming the capabilities of the Earth science metadata conventions currently in use. Data producers until now did not have much choice but to come up with ad hoc solutions to this challenge. Such solutions, in turn, pose a wide range of issues for data managers, distributors, and, ultimately, data users. The HDF Group is experimenting on a novel approach of using ISO 19115 metadata objects as a catch-all container for all the metadata that cannot be fitted into the current Earth science data conventions. This presentation will showcase how the HDF Product Designer software can be utilized to help data producers include various ISO metadata objects in their products.
GEO Label Web Services for Dynamic and Effective Communication of Geospatial Metadata Quality

NASA Astrophysics Data System (ADS)

Lush, Victoria; Nüst, Daniel; Bastin, Lucy; Masó, Joan; Lumsden, Jo

2014-05-01

-like label, which are coloured according to metadata availability and are clickable to allow a user to engage with the original metadata and explore specific aspects in more detail. To support this graphical representation and allow for wider deployment architectures we have implemented two Web services, a PHP and a Java implementation, that generate GEO label representations by combining producer metadata (from standard catalogues or other published locations) with structured user feedback. Both services accept encoded URLs of publicly available metadata documents or metadata XML files as HTTP POST and GET requests and apply XPath and XSLT mappings to transform producer and feedback XML documents into clickable SVG GEO label representations. The label and services are underpinned by two XML-based quality models. The first is a producer model that extends ISO 19115 and 19157 to allow fuller citation of reference data, presentation of pixel- and dataset- level statistical quality information, and encoding of 'traceability' information on the lineage of an actual quality assessment. The second is a user quality model (realised as a feedback server and client) which allows reporting and query of ratings, usage reports, citations, comments and other domain knowledge. Both services are Open Source and are available on GitHub at https://github.com/lushv/geolabel-service and https://github.com/52North/GEO-label-java. The functionality of these services can be tested using our GEO label generation demos, available online at http://www.geolabel.net/demo.html and http://geoviqua.dev.52north.org/glbservice/index.jsf.
Handling Metadata in a Neurophysiology Laboratory

PubMed Central

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G.; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework. PMID:27486397
Handling Metadata in a Neurophysiology Laboratory.

PubMed

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework.
BioCatalogue: a universal catalogue of web services for the life sciences.

PubMed

Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A

2010-07-01

The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable 'Web 2.0'-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community.
Probabilistic multi-catalogue positional cross-match

NASA Astrophysics Data System (ADS)

Pineau, F.-X.; Derriere, S.; Motch, C.; Carrera, F. J.; Genova, F.; Michel, L.; Mingo, B.; Mints, A.; Nebot Gómez-Morán, A.; Rosen, S. R.; Ruiz Camuñas, A.

2017-01-01

Context. Catalogue cross-correlation is essential to building large sets of multi-wavelength data, whether it be to study the properties of populations of astrophysical objects or to build reference catalogues (or timeseries) from survey observations. Nevertheless, resorting to automated processes with limited sets of information available on large numbers of sources detected at different epochs with various filters and instruments inevitably leads to spurious associations. We need both statistical criteria to select detections to be merged as unique sources, and statistical indicators helping in achieving compromises between completeness and reliability of selected associations. Aims: We lay the foundations of a statistical framework for multi-catalogue cross-correlation and cross-identification based on explicit simplified catalogue models. A proper identification process should rely on both astrometric and photometric data. Under some conditions, the astrometric part and the photometric part can be processed separately and merged a posteriori to provide a single global probability of identification. The present paper addresses almost exclusively the astrometrical part and specifies the proper probabilities to be merged with photometric likelihoods. Methods: To select matching candidates in n catalogues, we used the Chi (or, indifferently, the Chi-square) test with 2(n-1) degrees of freedom. We thus call this cross-match a χ-match. In order to use Bayes' formula, we considered exhaustive sets of hypotheses based on combinatorial analysis. The volume of the χ-test domain of acceptance - a 2(n-1)-dimensional acceptance ellipsoid - is used to estimate the expected numbers of spurious associations. We derived priors for those numbers using a frequentist approach relying on simple geometrical considerations. Likelihoods are based on standard Rayleigh, χ and Poisson distributions that we normalized over the χ-test acceptance domain. We validated our theoretical
Evaluating the privacy properties of telephone metadata.

PubMed

Mayer, Jonathan; Mutchler, Patrick; Mitchell, John C

2016-05-17

Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences.
Evaluating the privacy properties of telephone metadata

PubMed Central

Mayer, Jonathan; Mutchler, Patrick; Mitchell, John C.

2016-01-01

Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences. PMID:27185922
Submillimeter, millimeter, and microwave spectral line catalogue

NASA Technical Reports Server (NTRS)

Poynter, R. L.; Pickett, H. M.

1984-01-01

This report describes a computer accessible catalogue of submillimeter, millimeter, and microwave spectral lines in the frequency range between 0 and 10000 GHz (i.e., wavelengths longer than 30 micrometers). The catalogue can be used as a planning guide or as an aid in the identification and analysis of observed spectral lines. The information listed for each spectral line includes the frequency and its estimated error, the intensity, lower state energy, and quantum number assignment. The catalogue has been constructed using theoretical least squares fits of published spectral lines to accepted molecular models. The associated predictions and their estimated errors are based upon the resultant fitted parameters and their covariances. Future versions of this catalogue will add more atoms and molecules and update the present listings (151 species) as new data appear. The catalogue is available from the authors as a magnetic tape recorded in card images and as a set of microfiche records.
Seeking the Path to Metadata Nirvana

NASA Astrophysics Data System (ADS)

Graybeal, J.

2008-12-01

Scientists have always found reusing other scientists' data challenging. Computers did not fundamentally change the problem, but enabled more and larger instances of it. In fact, by removing human mediation and time delays from the data sharing process, computers emphasize the contextual information that must be exchanged in order to exchange and reuse data. This requirement for contextual information has two faces: "interoperability" when talking about systems, and "the metadata problem" when talking about data. As much as any single organization, the Marine Metadata Interoperability (MMI) project has been tagged with the mission "Solve the metadata problem." Of course, if that goal is achieved, then sustained, interoperable data systems for interdisciplinary observing networks can be easily built -- pesky metadata differences, like which protocol to use for data exchange, or what the data actually measures, will be a thing of the past. Alas, as you might imagine, there will always be complexities and incompatibilities that are not addressed, and data systems that are not interoperable, even within a science discipline. So should we throw up our hands and surrender to the inevitable? Not at all. Rather, we try to minimize metadata problems as much as we can. In this we increasingly progress, despite natural forces that pull in the other direction. Computer systems let us work with more complexity, build community knowledge and collaborations, and preserve and publish our progress and (dis-)agreements. Funding organizations, science communities, and technologists see the importance interoperable systems and metadata, and direct resources toward them. With the new approaches and resources, projects like IPY and MMI can simultaneously define, display, and promote effective strategies for sustainable, interoperable data systems. This presentation will outline the role metadata plays in durable interoperable data systems, for better or worse. It will describe times
An All-Sky Portable (ASP) Optical Catalogue

NASA Astrophysics Data System (ADS)

Flesch, Eric Wim

2017-06-01

This optical catalogue combines the all-sky USNO-B1.0/A1.0 and most-sky APM catalogues, plus overlays of SDSS optical data, into a single all-sky map presented in a sparse binary format that is easily downloaded at 9 Gb zipped. Total count is 1 163 237 190 sources and each has J2000 astrometry, red and blue magnitudes with PSFs and variability indicator, and flags for proper motion, epoch, and source survey and catalogue for each of the photometry and astrometry. The catalogue is available on http://quasars.org/asp.html, and additional data for this paper is available at http://dx.doi.org/10.4225/50/5807fbc12595f.
Making Interoperability Easier with the NASA Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.

2016-12-01

ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.
Metadata and Service at the GFZ ISDC Portal

NASA Astrophysics Data System (ADS)

Ritschel, B.

2008-05-01

The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for
The PDS4 Metadata Management System

NASA Astrophysics Data System (ADS)

Raugh, A. C.; Hughes, J. S.

2018-04-01

We present the key features of the Planetary Data System (PDS) PDS4 Information Model as an extendable metadata management system for planetary metadata related to data structure, analysis/interpretation, and provenance.

Assessing Metadata Quality of a Federally Sponsored Health Data Repository.

PubMed

Marc, David T; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn't frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality.
Assessing Metadata Quality of a Federally Sponsored Health Data Repository

PubMed Central

Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn’t frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality. PMID:28269883
Merged infrared catalogue

NASA Technical Reports Server (NTRS)

Schmitz, M.; Brown, L. W.; Mead, J. M.; Nagy, T. A.

1978-01-01

A compilation of equatorial coordinates, spectral types, magnitudes, and fluxes from five catalogues of infrared observations is presented. This first edition of the Merged Infrared Catalogue contains 11,201 oservations from the Two-Micron Sky Survey, Observations of Infrared Radiation from Cool Stars, the Air Force Geophysics Laboratory four Color Infrared Sky Survey and its Supplemental Catalog, and from Catalog of 10 micron Celestial Objects (HALL). This compilation is a by-product of a computerized infrared data base under development at Goddard Space Flight Center; the objective is to maintain a complete and current record of all infrared observations from 1 micron m to 1000 micron m of nonsolar system objects. These observations are being placed into a standardized system.
New Generation of Catalogues for the New Generation of Users: A Comparison of Six Library Catalogues

ERIC Educational Resources Information Center

Mercun, Tanja; Zumer, Maja

2008-01-01

Purpose: The purpose of this paper is to describe some of the problems and issues faced by online library catalogues. It aims to establish how libraries have undertaken the mission of developing the next generation catalogues and how they compare to new tools such as Amazon. Design/methodology/approach: An expert study was carried out in January…
Evolving Metadata in NASA Earth Science Data Systems

NASA Astrophysics Data System (ADS)

Mitchell, A.; Cechini, M. F.; Walter, J.

2011-12-01

NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of
BioCatalogue: a universal catalogue of web services for the life sciences

PubMed Central

Bhagat, Jiten; Tanoh, Franck; Nzuobontane, Eric; Laurent, Thomas; Orlowski, Jerzy; Roos, Marco; Wolstencroft, Katy; Aleksejevs, Sergejs; Stevens, Robert; Pettifer, Steve; Lopez, Rodrigo; Goble, Carole A.

2010-01-01

The use of Web Services to enable programmatic access to on-line bioinformatics is becoming increasingly important in the Life Sciences. However, their number, distribution and the variable quality of their documentation can make their discovery and subsequent use difficult. A Web Services registry with information on available services will help to bring together service providers and their users. The BioCatalogue (http://www.biocatalogue.org/) provides a common interface for registering, browsing and annotating Web Services to the Life Science community. Services in the BioCatalogue can be described and searched in multiple ways based upon their technical types, bioinformatics categories, user tags, service providers or data inputs and outputs. They are also subject to constant monitoring, allowing the identification of service problems and changes and the filtering-out of unavailable or unreliable resources. The system is accessible via a human-readable ‘Web 2.0’-style interface and a programmatic Web Service interface. The BioCatalogue follows a community approach in which all services can be registered, browsed and incrementally documented with annotations by any member of the scientific community. PMID:20484378
The XML Metadata Editor of GFZ Data Services

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

2017-04-01

Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor
A Metadata Action Language

NASA Technical Reports Server (NTRS)

Golden, Keith; Clancy, Dan (Technical Monitor)

2001-01-01

The data management problem comprises data processing and data tracking. Data processing is the creation of new data based on existing data sources. Data tracking consists of storing metadata descriptions of available data. This paper addresses the data management problem by casting it as an AI planning problem. Actions are data-processing commands, plans are dataflow programs and goals are metadata descriptions of desired data products. Data manipulation is simply plan generation and execution, and a key component of data tracking is inferring the effects of an observed plan. We introduce a new action language for data management domains, called ADILM. We discuss the connection between data processing and information integration and show how a language for the latter must be modified to support the former. The paper also discusses information gathering within a data-processing framework, and show how ADILM metadata expressions are a generalization of Local Completeness.
Improving Metadata Compliance for Earth Science Data Records

NASA Astrophysics Data System (ADS)

Armstrong, E. M.; Chang, O.; Foster, D.

2014-12-01

One of the recurring challenges of creating earth science data records is to ensure a consistent level of metadata compliance at the granule level where important details of contents, provenance, producer, and data references are necessary to obtain a sufficient level of understanding. These details are important not just for individual data consumers but also for autonomous software systems. Two of the most popular metadata standards at the granule level are the Climate and Forecast (CF) Metadata Conventions and the Attribute Conventions for Dataset Discovery (ACDD). Many data producers have implemented one or both of these models including the Group for High Resolution Sea Surface Temperature (GHRSST) for their global SST products and the Ocean Biology Processing Group for NASA ocean color and SST products. While both the CF and ACDD models contain various level of metadata richness, the actual "required" attributes are quite small in number. Metadata at the granule level becomes much more useful when recommended or optional attributes are implemented that document spatial and temporal ranges, lineage and provenance, sources, keywords, and references etc. In this presentation we report on a new open source tool to check the compliance of netCDF and HDF5 granules to the CF and ACCD metadata models. The tool, written in Python, was originally implemented to support metadata compliance for netCDF records as part of the NOAA's Integrated Ocean Observing System. It outputs standardized scoring for metadata compliance for both CF and ACDD, produces an objective summary weight, and can be implemented for remote records via OPeNDAP calls. Originally a command-line tool, we have extended it to provide a user-friendly web interface. Reports on metadata testing are grouped in hierarchies that make it easier to track flaws and inconsistencies in the record. We have also extended it to support explicit metadata structures and semantic syntax for the GHRSST project that can be
Assessing Public Metabolomics Metadata, Towards Improving Quality.

PubMed

Ferreira, João D; Inácio, Bruno; Salek, Reza M; Couto, Francisco M

2017-12-13

Public resources need to be appropriately annotated with metadata in order to make them discoverable, reproducible and traceable, further enabling them to be interoperable or integrated with other datasets. While data-sharing policies exist to promote the annotation process by data owners, these guidelines are still largely ignored. In this manuscript, we analyse automatic measures of metadata quality, and suggest their application as a mean to encourage data owners to increase the metadata quality of their resources and submissions, thereby contributing to higher quality data, improved data sharing, and the overall accountability of scientific publications. We analyse these metadata quality measures in the context of a real-world repository of metabolomics data (i.e. MetaboLights), including a manual validation of the measures, and an analysis of their evolution over time. Our findings suggest that the proposed measures can be used to mimic a manual assessment of metadata quality.
Updated earthquake catalogue for seismic hazard analysis in Pakistan

NASA Astrophysics Data System (ADS)

Khan, Sarfraz; Waseem, Muhammad; Khan, Muhammad Asif; Ahmed, Waqas

2018-03-01

A reliable and homogenized earthquake catalogue is essential for seismic hazard assessment in any area. This article describes the compilation and processing of an updated earthquake catalogue for Pakistan. The earthquake catalogue compiled in this study for the region (quadrangle bounded by the geographical limits 40-83° N and 20-40° E) includes 36,563 earthquake events, which are reported as 4.0-8.3 moment magnitude (M W) and span from 25 AD to 2016. Relationships are developed between the moment magnitude and body, and surface wave magnitude scales to unify the catalogue in terms of magnitude M W. The catalogue includes earthquakes from Pakistan and neighbouring countries to minimize the effects of geopolitical boundaries in seismic hazard assessment studies. Earthquakes reported by local and international agencies as well as individual catalogues are included. The proposed catalogue is further used to obtain magnitude of completeness after removal of dependent events by using four different algorithms. Finally, seismicity parameters of the seismic sources are reported, and recommendations are made for seismic hazard assessment studies in Pakistan.
The Metadata Coverage Index (MCI): A standardized metric for quantifying database metadata richness.

PubMed

Liolios, Konstantinos; Schriml, Lynn; Hirschman, Lynette; Pagani, Ioanna; Nosrat, Bahador; Sterk, Peter; White, Owen; Rocca-Serra, Philippe; Sansone, Susanna-Assunta; Taylor, Chris; Kyrpides, Nikos C; Field, Dawn

2012-07-30

Variability in the extent of the descriptions of data ('metadata') held in public repositories forces users to assess the quality of records individually, which rapidly becomes impractical. The scoring of records on the richness of their description provides a simple, objective proxy measure for quality that enables filtering that supports downstream analysis. Pivotally, such descriptions should spur on improvements. Here, we introduce such a measure - the 'Metadata Coverage Index' (MCI): the percentage of available fields actually filled in a record or description. MCI scores can be calculated across a database, for individual records or for their component parts (e.g., fields of interest). There are many potential uses for this simple metric: for example; to filter, rank or search for records; to assess the metadata availability of an ad hoc collection; to determine the frequency with which fields in a particular record type are filled, especially with respect to standards compliance; to assess the utility of specific tools and resources, and of data capture practice more generally; to prioritize records for further curation; to serve as performance metrics of funded projects; or to quantify the value added by curation. Here we demonstrate the utility of MCI scores using metadata from the Genomes Online Database (GOLD), including records compliant with the 'Minimum Information about a Genome Sequence' (MIGS) standard developed by the Genomic Standards Consortium. We discuss challenges and address the further application of MCI scores; to show improvements in annotation quality over time, to inform the work of standards bodies and repository providers on the usability and popularity of their products, and to assess and credit the work of curators. Such an index provides a step towards putting metadata capture practices and in the future, standards compliance, into a quantitative and objective framework.
Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

NASA Astrophysics Data System (ADS)

Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

2016-12-01

Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally
Collection Metadata Solutions for Digital Library Applications

NASA Technical Reports Server (NTRS)

Hill, Linda L.; Janee, Greg; Dolin, Ron; Frew, James; Larsgaard, Mary

1999-01-01

Within a digital library, collections may range from an ad hoc set of objects that serve a temporary purpose to established library collections intended to persist through time. The objects in these collections vary widely, from library and data center holdings to pointers to real-world objects, such as geographic places, and the various metadata schemas that describe them. The key to integrated use of such a variety of collections in a digital library is collection metadata that represents the inherent and contextual characteristics of a collection. The Alexandria Digital Library (ADL) Project has designed and implemented collection metadata for several purposes: in XML form, the collection metadata "registers" the collection with the user interface client; in HTML form, it is used for user documentation; eventually, it will be used to describe the collection to network search agents; and it is used for internal collection management, including mapping the object metadata attributes to the common search parameters of the system.
Compiling an earthquake catalogue for the Arabian Plate, Western Asia

NASA Astrophysics Data System (ADS)

Deif, Ahmed; Al-Shijbi, Yousuf; El-Hussain, Issa; Ezzelarab, Mohamed; Mohamed, Adel M. E.

2017-10-01

The Arabian Plate is surrounded by regions of relatively high seismicity. Accounting for this seismicity is of great importance for seismic hazard and risk assessments, seismic zoning, and land use. In this study, a homogenous earthquake catalogue of moment-magnitude (Mw) for the Arabian Plate is provided. The comprehensive and homogenous earthquake catalogue provided in the current study spatially involves the entire Arabian Peninsula and neighboring areas, covering all earthquake sources that can generate substantial hazard for the Arabian Plate mainland. The catalogue extends in time from 19 to 2015 with a total number of 13,156 events, of which 497 are historical events. Four polygons covering the entire Arabian Plate were delineated and different data sources including special studies, local, regional and international catalogues were used to prepare the earthquake catalogue. Moment magnitudes (Mw) that provided by original sources were given the highest magnitude type priority and introduced to the catalogues with their references. Earthquakes with magnitude differ from Mw were converted into this scale applying empirical relationships derived in the current or in previous studies. The four polygons catalogues were included in two comprehensive earthquake catalogues constituting the historical and instrumental periods. Duplicate events were identified and discarded from the current catalogue. The present earthquake catalogue was declustered in order to contain only independent events and investigated for the completeness with time of different magnitude spans.
The CMIP5 Model Documentation Questionnaire: Development of a Metadata Retrieval System for the METAFOR Common Information Model

NASA Astrophysics Data System (ADS)

Pascoe, Charlotte; Lawrence, Bryan; Moine, Marie-Pierre; Ford, Rupert; Devine, Gerry

2010-05-01

The EU METAFOR Project (http://metaforclimate.eu) has created a web-based model documentation questionnaire to collect metadata from the modelling groups that are running simulations in support of the Coupled Model Intercomparison Project - 5 (CMIP5). The CMIP5 model documentation questionnaire will retrieve information about the details of the models used, how the simulations were carried out, how the simulations conformed to the CMIP5 experiment requirements and details of the hardware used to perform the simulations. The metadata collected by the CMIP5 questionnaire will allow CMIP5 data to be compared in a scientifically meaningful way. This paper describes the life-cycle of the CMIP5 questionnaire development which starts with relatively unstructured input from domain specialists and ends with formal XML documents that comply with the METAFOR Common Information Model (CIM). Each development step is associated with a specific tool. (1) Mind maps are used to capture information requirements from domain experts and build a controlled vocabulary, (2) a python parser processes the XML files generated by the mind maps, (3) Django (python) is used to generate the dynamic structure and content of the web based questionnaire from processed xml and the METAFOR CIM, (4) Python parsers ensure that information entered into the CMIP5 questionnaire is output as CIM compliant xml, (5) CIM compliant output allows automatic information capture tools to harvest questionnaire content into databases such as the Earth System Grid (ESG) metadata catalogue. This paper will focus on how Django (python) and XML input files are used to generate the structure and content of the CMIP5 questionnaire. It will also address how the choice of development tools listed above provided a framework that enabled working scientists (who we would never ordinarily get to interact with UML and XML) to be part the iterative development process and ensure that the CMIP5 model documentation questionnaire
Hamilton Jeffers and the Double Star Catalogues

NASA Astrophysics Data System (ADS)

Tenn, Joseph S.

2013-01-01

Astronomers have long tracked double stars in efforts to find those that are gravitationally-bound binaries and then to determine their orbits. Court reporter and amateur astronomer Shelburne Wesley Burnham (1838-1921) published a massive double star catalogue containing more than 13,000 systems in 1906. The next keeper of the double stars was Lick Observatory astronomer Robert Grant Aitken (1864-1951), who produced a much larger catalogue in 1932. Aitken maintained and expanded Burnham’s records of observations on handwritten file cards, eventually turning them over to Lick Observatory astrometrist Hamilton Moore Jeffers (1893-1976). Jeffers further expanded the collection and put all the observations on punched cards. With the aid of Frances M. "Rete" Greeby (1921-2002), he made two catalogues: an Index Catalogue with basic data about each star, and a complete catalogue of observations, with one observation per punched card. He enlisted Willem van den Bos of Johannesburg to add southern stars, and they published the Index Catalogue of Visual Double Stars, 1961.0. As Jeffers approached retirement he became greatly concerned about the disposition of the catalogues. He wanted to be replaced by another "double star man," but Lick Director Albert E. Whitford (1905-2002) had the new 120-inch reflector, the world’s second largest telescope, and he wanted to pursue modern astrophysics instead. Jeffers was vociferously opposed to turning over the card files to another institution, and especially against their coming under the control of Kaj Strand of the U.S. Naval Observatory. In the end the USNO got the files and has maintained the records ever since, first under Charles Worley (1935-1997), and, since 1997, under Brian Mason. Now called the Washington Double Star Catalog (WDS), it is completely online and currently contains more than 1,000,000 measures of more than 100,000 pairs.
Achieving interoperability for metadata registries using comparative object modeling.

PubMed

Park, Yu Rang; Kim, Ju Han

2010-01-01

Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.
Brady's Geothermal Field Nodal Seismometers Metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lesley Parker

Metadata for the nodal seismometer array deployed at the POROTOMO's Natural Laboratory in Brady Hot Spring, Nevada during the March 2016 testing. Metadata includes location and timing for each instrument as well as file lists of data to be uploaded in a separate submission.
Metadata to Support Data Warehouse Evolution

NASA Astrophysics Data System (ADS)

Solodovnikova, Darja

The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.

Streamlining geospatial metadata in the Semantic Web

NASA Astrophysics Data System (ADS)

Fugazza, Cristiano; Pepe, Monica; Oggioni, Alessandro; Tagliolato, Paolo; Carrara, Paola

2016-04-01

In the geospatial realm, data annotation and discovery rely on a number of ad-hoc formats and protocols. These have been created to enable domain-specific use cases generalized search is not feasible for. Metadata are at the heart of the discovery process and nevertheless they are often neglected or encoded in formats that either are not aimed at efficient retrieval of resources or are plainly outdated. Particularly, the quantum leap represented by the Linked Open Data (LOD) movement did not induce so far a consistent, interlinked baseline in the geospatial domain. In a nutshell, datasets, scientific literature related to them, and ultimately the researchers behind these products are only loosely connected; the corresponding metadata intelligible only to humans, duplicated on different systems, seldom consistently. Instead, our workflow for metadata management envisages i) editing via customizable web- based forms, ii) encoding of records in any XML application profile, iii) translation into RDF (involving the semantic lift of metadata records), and finally iv) storage of the metadata as RDF and back-translation into the original XML format with added semantics-aware features. Phase iii) hinges on relating resource metadata to RDF data structures that represent keywords from code lists and controlled vocabularies, toponyms, researchers, institutes, and virtually any description one can retrieve (or directly publish) in the LOD Cloud. In the context of a distributed Spatial Data Infrastructure (SDI) built on free and open-source software, we detail phases iii) and iv) of our workflow for the semantics-aware management of geospatial metadata.
32 CFR 575.6 - Catalogue, United States Military Academy.

Code of Federal Regulations, 2013 CFR

2013-07-01

... 32 National Defense 3 2013-07-01 2013-07-01 false Catalogue, United States Military Academy. 575.6... ADMISSION TO THE UNITED STATES MILITARY ACADEMY § 575.6 Catalogue, United States Military Academy. The latest edition of the catalogue, United States Military Academy, contains additional information...
32 CFR 575.6 - Catalogue, United States Military Academy.

Code of Federal Regulations, 2011 CFR

2011-07-01

... 32 National Defense 3 2011-07-01 2009-07-01 true Catalogue, United States Military Academy. 575.6... ADMISSION TO THE UNITED STATES MILITARY ACADEMY § 575.6 Catalogue, United States Military Academy. The latest edition of the catalogue, United States Military Academy, contains additional information...
32 CFR 575.6 - Catalogue, United States Military Academy.

Code of Federal Regulations, 2010 CFR

2010-07-01

... 32 National Defense 3 2010-07-01 2010-07-01 true Catalogue, United States Military Academy. 575.6... ADMISSION TO THE UNITED STATES MILITARY ACADEMY § 575.6 Catalogue, United States Military Academy. The latest edition of the catalogue, United States Military Academy, contains additional information...
Making Interoperability Easier with NASA's Metadata Management Tool (MMT)

NASA Technical Reports Server (NTRS)

Shum, Dana; Reese, Mark; Pilone, Dan; Baynes, Katie

2016-01-01

While the ISO-19115 collection level metadata format meets many users' needs for interoperable metadata, it can be cumbersome to create it correctly. Through the MMT's simple UI experience, metadata curators can create and edit collections which are compliant with ISO-19115 without full knowledge of the NASA Best Practices implementation of ISO-19115 format. Users are guided through the metadata creation process through a forms-based editor, complete with field information, validation hints and picklists. Once a record is completed, users can download the metadata in any of the supported formats with just 2 clicks.
Metadata, Identifiers, and Physical Samples

NASA Astrophysics Data System (ADS)

Arctur, D. K.; Lenhardt, W. C.; Hills, D. J.; Jenkyns, R.; Stroker, K. J.; Todd, N. S.; Dassie, E. P.; Bowring, J. F.

2016-12-01

Physical samples are integral to much of the research conducted by geoscientists. The samples used in this research are often obtained at significant cost and represent an important investment for future research. However, making information about samples - whether considered data or metadata - available for researchers to enable discovery is difficult: a number of key elements related to samples are difficult to characterize in common ways, such as classification, location, sample type, sampling method, repository information, subsample distribution, and instrumentation, because these differ from one domain to the next. Unifying these elements or developing metadata crosswalks is needed. The iSamples (Internet of Samples) NSF-funded Research Coordination Network (RCN) is investigating ways to develop these types of interoperability and crosswalks. Within the iSamples RCN, one of its working groups, WG1, has focused on the metadata related to physical samples. This includes identifying existing metadata standards and systems, and how they might interoperate with the International Geo Sample Number (IGSN) schema (schema.igsn.org) in order to help inform leading practices for metadata. For example, we are examining lifecycle metadata beyond the IGSN `birth certificate.' As a first step, this working group is developing a list of relevant standards and comparing their various attributes. In addition, the working group is looking toward technical solutions to facilitate developing a linked set of registries to build the web of samples. Finally, the group is also developing a comparison of sample identifiers and locators. This paper will provide an overview and comparison of the standards identified thus far, as well as an update on the technical solutions examined for integration. We will discuss how various sample identifiers might work in complementary fashion with the IGSN to more completely describe samples, facilitate retrieval of contextual information, and
Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.

PubMed

Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C

2018-01-17

Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.
The Liverpool-Edinburgh high proper motion catalogue

NASA Astrophysics Data System (ADS)

Pokorny, R. S.; Jones, H. R. A.; Hambly, N. C.; Pinfield, D. J.

2004-07-01

We present a machine selected catalogue of 11 289 objects with proper motions exceeding 0.18 arcsec yr-1 and an R-band faint magnitude limit of 19.5 mag. The catalogue was produced using SuperCOSMOS digitized R-Band ESO and UK Schmidt Plates in 287 Schmidt fields covering almost 7000 square degrees (˜17% of the whole sky) at the South Galactic Cap. The catalogue includes UK Schmidt BJ and I magnitudes for all of the stars as well as 2MASS magnitudes for 10,447 of the catalogue stars. We also show that the NLTT is ˜95% complete for Dec > -32.5°. The full Table \\ref{tab1} is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/421/763
Submillimeter, millimeter, and microwave spectral line catalogue

NASA Technical Reports Server (NTRS)

Poynter, R. L.; Pickett, H. M.

1981-01-01

A computer accessible catalogue of submillimeter, millimeter and microwave spectral lines in the frequency range between 0 and 3000 GHZ (i.e., wavelengths longer than 100 mu m) is presented which can be used a planning guide or as an aid in the identification and analysis of observed spectral lines. The information listed for each spectral line includes the frequency and its estimated error, the intensity, lower state energy, and quantum number assignment. The catalogue was constructed by using theoretical least squares fits of published spectral lines to accepted molecular models. The associated predictions and their estimated errors are based upon the resultant fitted parameters and their covariances. Future versions of this catalogue will add more atoms and molecules and update the present listings (133 species) as new data appear. The catalogue is available as a magnetic tape recorded in card images and as a set of microfiche records.
Catalogue of HI PArameters (CHIPA)

NASA Astrophysics Data System (ADS)

Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.

2015-08-01

The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.
Prediction of Solar Eruptions Using Filament Metadata

NASA Astrophysics Data System (ADS)

Aggarwal, Ashna; Schanche, Nicole; Reeves, Katharine K.; Kempton, Dustin; Angryk, Rafal

2018-05-01

We perform a statistical analysis of erupting and non-erupting solar filaments to determine the properties related to the eruption potential. In order to perform this study, we correlate filament eruptions documented in the Heliophysics Event Knowledgebase (HEK) with HEK filaments that have been grouped together using a spatiotemporal tracking algorithm. The HEK provides metadata about each filament instance, including values for length, area, tilt, and chirality. We add additional metadata properties such as the distance from the nearest active region and the magnetic field decay index. We compare trends in the metadata from erupting and non-erupting filament tracks to discover which properties present signs of an eruption. We find that a change in filament length over time is the most important factor in discriminating between erupting and non-erupting filament tracks, with erupting tracks being more likely to have decreasing length. We attempt to find an ensemble of predictive filament metadata using a Random Forest Classifier approach, but find the probability of correctly predicting an eruption with the current metadata is only slightly better than chance.
eXtended MetaData Registry

DOE Office of Scientific and Technical Information (OSTI.GOV)

2006-10-25

The purpose of the eXtended MetaData Registry (XMDR) prototype is to demonstrate the feasibility and utility of constructing an extended metadata registry, i.e., one which encompasses richer classification support, facilities for including terminologies, and better support for formal specification of semantics. The prototype registry will also serve as a reference implementation for the revised versions of ISO 11179, Parts 2 and 3 to help guide production implementations.
Catalogue of Exoplanets in Multiple-Star-Systems

NASA Astrophysics Data System (ADS)

Schwarz, Richard; Funk, Barbara; Bazsó, Ákos; Pilat-Lohinger, Elke

2017-07-01

Cataloguing the data of exoplanetary systems becomes more and more important, due to the fact that they conclude the observations and support the theoretical studies. Since 1995 there is a database which list most of the known exoplanets (The Extrasolar Planets Encyclopaedia is available at http://exoplanet.eu/ and described at Schneider et al. 2011). With the growing number of detected exoplanets in binary and multiple star systems it became more important to mark and to separate them into a new database. Therefore we started to compile a catalogue for binary and multiple star systems. Since 2013 the catalogue can be found at http://www.univie.ac.at/adg/schwarz/multiple.html (description can be found at Schwarz et al. 2016) which will be updated regularly and is linked to the Extrasolar Planets Encyclopaedia. The data of the binary catalogue can be downloaded as a file (.csv) and used for statistical purposes. Our database is divided into two parts: the data of the stars and the planets, given in a separate list. Every columns of the list can be sorted in two directions: ascending, meaning from the lowest value to the highest, or descending. In addition an introduction and help is also given in the menu bar of the catalogue including an example list.
Making metadata usable in a multi-national research setting.

PubMed

Ellul, Claire; Foord, Joanna; Mooney, John

2013-11-01

SECOA (Solutions for Environmental Contrasts in Coastal Areas) is a multi-national research project examining the effects of human mobility on urban settlements in fragile coastal environments. This paper describes the setting up of a SECOA metadata repository for non-specialist researchers such as environmental scientists and tourism experts. Conflicting usability requirements of two groups - metadata creators and metadata users - are identified along with associated limitations of current metadata standards. A description is given of a configurable metadata system designed to grow as the project evolves. This work is of relevance for similar projects such as INSPIRE. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Technology Catalogue. First edition

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1994-02-01

The Department of Energy`s Office of Environmental Restoration and Waste Management (EM) is responsible for remediating its contaminated sites and managing its waste inventory in a safe and efficient manner. EM`s Office of Technology Development (OTD) supports applied research and demonstration efforts to develop and transfer innovative, cost-effective technologies to its site clean-up and waste management programs within EM`s Office of Environmental Restoration and Office of Waste Management. The purpose of the Technology Catalogue is to provide performance data on OTD-developed technologies to scientists and engineers assessing and recommending technical solutions within the Department`s clean-up and waste management programs, asmore » well as to industry, other federal and state agencies, and the academic community. OTD`s applied research and demonstration activities are conducted in programs referred to as Integrated Demonstrations (IDs) and Integrated Programs (IPs). The IDs test and evaluate.systems, consisting of coupled technologies, at specific sites to address generic problems, such as the sensing, treatment, and disposal of buried waste containers. The IPs support applied research activities in specific applications areas, such as in situ remediation, efficient separations processes, and site characterization. The Technology Catalogue is a means for communicating the status. of the development of these innovative technologies. The FY93 Technology Catalogue features technologies successfully demonstrated in the field through IDs and sufficiently mature to be used in the near-term. Technologies from the following IDs are featured in the FY93 Technology Catalogue: Buried Waste ID (Idaho National Engineering Laboratory, Idaho); Mixed Waste Landfill ID (Sandia National Laboratories, New Mexico); Underground Storage Tank ID (Hanford, Washington); Volatile organic compound (VOC) Arid ID (Richland, Washington); and VOC Non-Arid ID (Savannah River Site, South
Web Based Data Access to the World Data Center for Climate

NASA Astrophysics Data System (ADS)

Toussaint, F.; Lautenschlager, M.

2006-12-01

The World Data Center for Climate (WDC-Climate, www.wdc-climate.de) is hosted by the Model &Data Group (M&D) of the Max Planck Institute for Meteorology. The M&D department is financed by the German government and uses the computers and mass storage facilities of the German Climate Computing Centre (Deutsches Klimarechenzentrum, DKRZ). The WDC-Climate provides web access to 200 Terabytes of climate data; the total mass storage archive contains nearly 4 Petabytes. Although the majority of the datasets concern model output data, some satellite and observational data are accessible as well. The underlying relational database is distributed on five servers. The CERA relational data model is used to integrate catalogue data and mass data. The flexibility of the model allows to store and access very different types of data and metadata. The CERA metadata catalogue provides easy access to the content of the CERA database as well as to other data in the web. Visit ceramodel.wdc-climate.de for additional information on the CERA data model. The majority of the users access data via the CERA metadata catalogue, which is open without registration. However, prior to retrieving data user are required to check in and apply for a userid and password. The CERA metadata catalogue is servlet based. So it is accessible worldwide through any web browser at cera.wdc-climate.de. In addition to data and metadata access by the web catalogue, WDC-Climate offers a number of other forms of web based data access. All metadata are available via http request as xml files in various metadata formats (ISO, DC, etc., see wini.wdc-climate.de) which allows for easy data interchange with other catalogues. Model data can be retrieved in GRIB, ASCII, NetCDF, and binary (IEEE) format. WDC-Climate serves as data centre for various projects. Since xml files are accessible by http, the integration of data into applications of different projects is very easy. Projects supported by WDC-Climate are e.g. CEOP
Progress in defining a standard for file-level metadata

NASA Technical Reports Server (NTRS)

Williams, Joel; Kobler, Ben

1996-01-01

In the following narrative, metadata required to locate a file on tape or collection of tapes will be referred to as file-level metadata. This paper discribes the rationale for and the history of the effort to define a standard for this metadata.
The role of metadata in managing large environmental science datasets. Proceedings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melton, R.B.; DeVaney, D.M.; French, J. C.

1995-06-01

The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.
Second ROSAT all-sky survey (2RXS) source catalogue

NASA Astrophysics Data System (ADS)

Boller, Th.; Freyberg, M. J.; Trümper, J.; Haberl, F.; Voges, W.; Nandra, K.

2016-04-01

Aims: We present the second ROSAT all-sky survey source catalogue, hereafter referred to as the 2RXS catalogue. This is the second publicly released ROSAT catalogue of point-like sources obtained from the ROSAT all-sky survey (RASS) observations performed with the position-sensitive proportional counter (PSPC) between June 1990 and August 1991, and is an extended and revised version of the bright and faint source catalogues. Methods: We used the latest version of the RASS processing to produce overlapping X-ray images of 6.4° × 6.4° sky regions. To create a source catalogue, a likelihood-based detection algorithm was applied to these, which accounts for the variable point-spread function (PSF) across the PSPC field of view. Improvements in the background determination compared to 1RXS were also implemented. X-ray control images showing the source and background extraction regions were generated, which were visually inspected. Simulations were performed to assess the spurious source content of the 2RXS catalogue. X-ray spectra and light curves were extracted for the 2RXS sources, with spectral and variability parameters derived from these products. Results: We obtained about 135 000 X-ray detections in the 0.1-2.4 keV energy band down to a likelihood threshold of 6.5, as adopted in the 1RXS faint source catalogue. Our simulations show that the expected spurious content of the catalogue is a strong function of detection likelihood, and the full catalogue is expected to contain about 30% spurious detections. A more conservative likelihood threshold of 9, on the other hand, yields about 71 000 detections with a 5% spurious fraction. We recommend thresholds appropriate to the scientific application. X-ray images and overlaid X-ray contour lines provide an additional user product to evaluate the detections visually, and we performed our own visual inspections to flag uncertain detections. Intra-day variability in the X-ray light curves was quantified based on the
Submillimeter, millimeter, and microwave spectral line catalogue

NASA Technical Reports Server (NTRS)

Poynter, R. L.; Pickett, H. M.

1980-01-01

A computer accessible catalogue of submillimeter, millimeter, and microwave spectral lines in the frequency range between O and 3000 GHz (such as; wavelengths longer than 100 m) is discussed. The catalogue was used as a planning guide and as an aid in the identification and analysis of observed spectral lines. The information listed for each spectral line includes the frequency and its estimated error, the intensity, lower state energy, and quantum number assignment. The catalogue was constructed by using theoretical least squares fits of published spectral lines to accepted molecular models. The associated predictions and their estimated errors are based upon the resultant fitted parameters and their covariances.

Efficient processing of MPEG-21 metadata in the binary domain

NASA Astrophysics Data System (ADS)

Timmerer, Christian; Frank, Thomas; Hellwagner, Hermann; Heuer, Jörg; Hutter, Andreas

2005-10-01

XML-based metadata is widely adopted across the different communities and plenty of commercial and open source tools for processing and transforming are available on the market. However, all of these tools have one thing in common: they operate on plain text encoded metadata which may become a burden in constrained and streaming environments, i.e., when metadata needs to be processed together with multimedia content on the fly. In this paper we present an efficient approach for transforming such kind of metadata which are encoded using MPEG's Binary Format for Metadata (BiM) without additional en-/decoding overheads, i.e., within the binary domain. Therefore, we have developed an event-based push parser for BiM encoded metadata which transforms the metadata by a limited set of processing instructions - based on traditional XML transformation techniques - operating on bit patterns instead of cost-intensive string comparisons.
Content Metadata Standards for Marine Science: A Case Study

USGS Publications Warehouse

Riall, Rebecca L.; Marincioni, Fausto; Lightsom, Frances L.

2004-01-01

The U.S. Geological Survey developed a content metadata standard to meet the demands of organizing electronic resources in the marine sciences for a broad, heterogeneous audience. These metadata standards are used by the Marine Realms Information Bank project, a Web-based public distributed library of marine science from academic institutions and government agencies. The development and deployment of this metadata standard serve as a model, complete with lessons about mistakes, for the creation of similarly specialized metadata standards for digital libraries.
Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.

2010-12-01

While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and
openPDS: protecting the privacy of metadata through SafeAnswers.

PubMed

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.
openPDS: Protecting the Privacy of Metadata through SafeAnswers

PubMed Central

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S.; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research. PMID:25007320
A Revised Earthquake Catalogue for South Iceland

NASA Astrophysics Data System (ADS)

Panzera, Francesco; Zechar, J. Douglas; Vogfjörd, Kristín S.; Eberhard, David A. J.

2016-01-01

In 1991, a new seismic monitoring network named SIL was started in Iceland with a digital seismic system and automatic operation. The system is equipped with software that reports the automatic location and magnitude of earthquakes, usually within 1-2 min of their occurrence. Normally, automatic locations are manually checked and re-estimated with corrected phase picks, but locations are subject to random errors and systematic biases. In this article, we consider the quality of the catalogue and produce a revised catalogue for South Iceland, the area with the highest seismic risk in Iceland. We explore the effects of filtering events using some common recommendations based on network geometry and station spacing and, as an alternative, filtering based on a multivariate analysis that identifies outliers in the hypocentre error distribution. We identify and remove quarry blasts, and we re-estimate the magnitude of many events. This revised catalogue which we consider to be filtered, cleaned, and corrected should be valuable for building future seismicity models and for assessing seismic hazard and risk. We present a comparative seismicity analysis using the original and revised catalogues: we report characteristics of South Iceland seismicity in terms of b value and magnitude of completeness. Our work demonstrates the importance of carefully checking an earthquake catalogue before proceeding with seismicity analysis.
A Metadata Element Set for Project Documentation

NASA Technical Reports Server (NTRS)

Hodge, Gail; Templeton, Clay; Allen, Robert B.

2003-01-01

Abstract NASA Goddard Space Flight Center is a large engineering enterprise with many projects. We describe our efforts to develop standard metadata sets across project documentation which we term the "Goddard Core". We also address broader issues for project management metadata.
International Metadata Standards and Enterprise Data Quality Metadata Systems

NASA Astrophysics Data System (ADS)

Habermann, T.

2016-12-01

Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The "one size fits all" paradigm does not generally work well in this situation. The ISO data quality standard (ISO 19157) takes a different approach with the goal of systematically describing how data quality is measured rather than how it should be measured. It introduces the idea of standard data quality measures that can be well documented in a measure repository and used for consistently describing how data quality is measured across an enterprise. The standard includes recommendations for properties of these measures that include unique identifiers, references, illustrations and examples. Metadata records can reference these measures using the unique identifier and reuse them along with details (and references) that describe how the measure was applied to a particular dataset. A second important feature of ISO 19157 is the inclusion of citations to existing papers or reports that describe quality of a dataset. This capability allows users to find this information in a single location, i.e. the dataset metadata, rather than searching the web or other catalogs. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.
Dark Energy Survey Year 1 Results: Weak Lensing Shape Catalogues

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuntz, J.; et al.

We present two galaxy shape catalogues from the Dark Energy Survey Year 1 data set, covering 1500 square degrees with a median redshift ofmore » $0.59$. The catalogues cover two main fields: Stripe 82, and an area overlapping the South Pole Telescope survey region. We describe our data analysis process and in particular our shape measurement using two independent shear measurement pipelines, METACALIBRATION and IM3SHAPE. The METACALIBRATION catalogue uses a Gaussian model with an innovative internal calibration scheme, and was applied to $riz$-bands, yielding 34.8M objects. The IM3SHAPE catalogue uses a maximum-likelihood bulge/disc model calibrated using simulations, and was applied to $r$-band data, yielding 21.9M objects. Both catalogues pass a suite of null tests that demonstrate their fitness for use in weak lensing science. We estimate the 1$$\\sigma$$ uncertainties in multiplicative shear calibration to be $0.013$ and $0.025$ for the METACALIBRATION and IM3SHAPE catalogues, respectively.« less
Design and Implementation of a Metadata-rich File System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ames, S; Gokhale, M B; Maltzahn, C

2010-01-19

Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less
SCOPE in Cataloguing.

ERIC Educational Resources Information Center

Tom, Ellen; Reed, Sue

This report describes the Systematic Computerized Processing in Cataloguing system (SCOPE), an automated system for the catalog department of a university library. The system produces spine labels, pocket labels, book cards for the circulation system, catalog cards including shelf list, main entry, subject and added entry cards, statistics, an…
Evaluating and Evolving Metadata in Multiple Dialects

NASA Technical Reports Server (NTRS)

Kozimore, John; Habermann, Ted; Gordon, Sean; Powers, Lindsay

2016-01-01

Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways.
Enhancing SCORM Metadata for Assessment Authoring in E-Learning

ERIC Educational Resources Information Center

Chang, Wen-Chih; Hsu, Hui-Huang; Smith, Timothy K.; Wang, Chun-Chia

2004-01-01

With the rapid development of distance learning and the XML technology, metadata play an important role in e-Learning. Nowadays, many distance learning standards, such as SCORM, AICC CMI, IEEE LTSC LOM and IMS, use metadata to tag learning materials. However, most metadata models are used to define learning materials and test problems. Few…
Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

NASA Astrophysics Data System (ADS)

Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

2017-12-01

The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.
GT-57633 catalogue of Martian impact craters developed for evaluation of crater detection algorithms

NASA Astrophysics Data System (ADS)

Salamunićcar, Goran; Lončarić, Sven

2008-12-01

Crater detection algorithms (CDAs) are an important subject of the recent scientific research. A ground truth (GT) catalogue, which contains the locations and sizes of known craters, is important for the evaluation of CDAs in a wide range of CDA applications. Unfortunately, previous catalogues of craters by other authors cannot be easily used as GT. In this paper, we propose a method for integration of several existing catalogues to obtain a new craters catalogue. The methods developed and used during this work on the GT catalogue are: (1) initial screening of used catalogues; (2) evaluation of self-consistency of used catalogues; (3) initial registration from three different catalogues; (4) cross-evaluation of used catalogues; (5) additional registrations and registrations from additional catalogues; and (6) fine-tuning and registration with additional data-sets. During this process, all craters from all major currently available manually assembled catalogues were processed, including catalogues by Barlow, Rodionova, Boyce, Kuzmin, and our previous work. Each crater from the GT catalogue contains references to crater(s) that are used for its registration. This provides direct access to all properties assigned to craters from the used catalogues, which can be of interest even to those scientists that are not directly interested in CDAs. Having all these craters in a single catalogue also provides a good starting point for searching for craters still not catalogued manually, which is also expected to be one of the challenges of CDAs. The resulting new GT catalogue contains 57,633 craters, significantly more than any previous catalogue. From this point of view, GT-57633 catalogue is currently the most complete catalogue of large Martian impact craters. Additionally, each crater from the resulting GT-57633 catalogue is aligned with MOLA topography and, during the final review phase, additionally registered/aligned with 1/256° THEMIS-DIR, 1/256° MDIM and 1/256° MOC
A metadata-driven approach to data repository design.

PubMed

Harvey, Matthew J; McLean, Andrew; Rzepa, Henry S

2017-01-01

The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific data analytics without introducing customisation into the repository itself.
ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability

NASA Astrophysics Data System (ADS)

Trapasso, T. J.

2017-12-01

With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).
Metadata Authoring with Versatility and Extensibility

NASA Technical Reports Server (NTRS)

Pollack, Janine; Olsen, Lola

2004-01-01

NASA's Global Change Master Directory (GCMD) assists the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 13,800 data set descriptions in Directory Interchange Format (DIF) and 700 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information and direct links to the data, thus allowing researchers to discover data pertaining to a geographic location of interest, then quickly acquire those data. The GCMD strives to be the preferred data locator for world-wide directory-level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are attracting widespread usage; however, a need for tools that are portable, customizable and versatile still exists. With tool usage directly influencing metadata population, it has become apparent that new tools are needed to fill these voids. As a result, the GCMD has released a new authoring tool allowing for both web-based and stand-alone authoring of descriptions. Furthermore, this tool incorporates the ability to plug-and-play the metadata format of choice, offering users options of DIF, SERF, FGDC, ISO or any other defined standard. Allowing data holders to work with their preferred format, as well as an option of a stand-alone application or web-based environment, docBUlLDER will assist the scientific community in efficiently creating quality data and services metadata.
Publishing NASA Metadata as Linked Open Data for Semantic Mashups

NASA Astrophysics Data System (ADS)

Wilson, Brian; Manipon, Gerald; Hua, Hook

2014-05-01

Data providers are now publishing more metadata in more interoperable forms, e.g. Atom or RSS 'casts', as Linked Open Data (LOD), or as ISO Metadata records. A major effort on the part of the NASA's Earth Science Data and Information System (ESDIS) project is the aggregation of metadata that enables greater data interoperability among scientific data sets regardless of source or application. Both the Earth Observing System (EOS) ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) repositories contain metadata records for NASA (and other) datasets and provided services. These records contain typical fields for each dataset (or software service) such as the source, creation date, cognizant institution, related access URL's, and domain and variable keywords to enable discovery. Under a NASA ACCESS grant, we demonstrated how to publish the ECHO and GCMD dataset and services metadata as LOD in the RDF format. Both sets of metadata are now queryable at SPARQL endpoints and available for integration into "semantic mashups" in the browser. It is straightforward to reformat sets of XML metadata, including ISO, into simple RDF and then later refine and improve the RDF predicates by reusing known namespaces such as Dublin core, georss, etc. All scientific metadata should be part of the LOD world. In addition, we developed an "instant" drill-down and browse interface that provides faceted navigation so that the user can discover and explore the 25,000 datasets and 3000 services. The available facets and the free-text search box appear in the left panel, and the instantly updated results for the dataset search appear in the right panel. The user can constrain the value of a metadata facet simply by clicking on a word (or phrase) in the "word cloud" of values for each facet. The display section for each dataset includes the important metadata fields, a full description of the dataset, potentially some related URL's, and a "search" button that points to an Open
Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.

Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
IMCAT: Image and Catalogue Manipulation Software

NASA Astrophysics Data System (ADS)

Kaiser, Nick

2011-08-01

The IMCAT software was developed initially to do faint galaxy photometry for weak lensing studies, and provides a fairly complete set of tools for this kind of work. Unlike most packages for doing data analysis, the tools are standalone unix commands which you can invoke from the shell, via shell scripts or from perl scripts. The tools are arranges in a tree of directories. One main branch is the ’imtools’. These deal only with fits files. The most important imtool is the ’image calculator’ ’ic’ which allows one to do rather general operations on fits images. A second branch is the ’catools’ which operate only on catalogues. The key cattool is ’lc’; this effectively defines the format of IMCAT catalogues, and allows one to do very general operations on and filtering of such catalogues. A third branch is the ’imcattools’. These tend to be much more specialised than the cattools and imcattools and are focussed on faint galaxy photometry.
Applications of the LBA-ECO Metadata Warehouse

NASA Astrophysics Data System (ADS)

Wilcox, L.; Morrell, A.; Griffith, P. C.

2006-05-01

The LBA-ECO Project Office has developed a system to harvest and warehouse metadata resulting from the Large-Scale Biosphere Atmosphere Experiment in Amazonia. The harvested metadata is used to create dynamically generated reports, available at www.lbaeco.org, which facilitate access to LBA-ECO datasets. The reports are generated for specific controlled vocabulary terms (such as an investigation team or a geospatial region), and are cross-linked with one another via these terms. This approach creates a rich contextual framework enabling researchers to find datasets relevant to their research. It maximizes data discovery by association and provides a greater understanding of the scientific and social context of each dataset. For example, our website provides a profile (e.g. participants, abstract(s), study sites, and publications) for each LBA-ECO investigation. Linked from each profile is a list of associated registered dataset titles, each of which link to a dataset profile that describes the metadata in a user-friendly way. The dataset profiles are generated from the harvested metadata, and are cross-linked with associated reports via controlled vocabulary terms such as geospatial region. The region name appears on the dataset profile as a hyperlinked term. When researchers click on this link, they find a list of reports relevant to that region, including a list of dataset titles associated with that region. Each dataset title in this list is hyperlinked to its corresponding dataset profile. Moreover, each dataset profile contains hyperlinks to each associated data file at its home data repository and to publications that have used the dataset. We also use the harvested metadata in administrative applications to assist quality assurance efforts. These include processes to check for broken hyperlinks to data files, automated emails that inform our administrators when critical metadata fields are updated, dynamically generated reports of metadata records that link
Leveraging Metadata to Create Better Web Services

ERIC Educational Resources Information Center

Mitchell, Erik

2012-01-01

Libraries have been increasingly concerned with data creation, management, and publication. This increase is partly driven by shifting metadata standards in libraries and partly by the growth of data and metadata repositories being managed by libraries. In order to manage these data sets, libraries are looking for new preservation and discovery…
Broad Absorption Line Quasar catalogues with Supervised Neural Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scaringi, Simone; Knigge, Christian; Cottis, Christopher E.

2008-12-05

We have applied a Learning Vector Quantization (LVQ) algorithm to SDSS DR5 quasar spectra in order to create a large catalogue of broad absorption line quasars (BALQSOs). We first discuss the problems with BALQSO catalogues constructed using the conventional balnicity and/or absorption indices (BI and AI), and then describe the supervised LVQ network we have trained to recognise BALQSOs. The resulting BALQSO catalogue should be substantially more robust and complete than BI-or AI-based ones.
Automated Test Methods for XML Metadata

DTIC Science & Technology

2017-12-28

Group under RCC Task TG-147. This document (Volume VI of the RCC Document 118 series) describes procedures used for evaluating XML metadata documents...including TMATS, MDL, IHAL, and DDML documents. These documents contain specifications or descriptions of artifacts and systems of importance to...the collection and management of telemetry data. The methods defined in this report provide a means of evaluating the suitability of such a metadata
Tsunami Catalogues for the Eastern Mediterranean - Revisited.

NASA Astrophysics Data System (ADS)

Ambraseys, N.; Synolakis, C. E.

2008-12-01

We critically examine examine tsunami catalogues of tsunamis in the Eastern Mediterranean published in the last decade, by reference to the original sources, see Ambraseys (2008). Such catalogues have been widely used in the aftermath of the 2004 Boxing Day tsunami for probabilistic hazard analysis, even to make projections for a ten year time frame. On occasion, such predictions have caused panic and have reduced the credibility of the scientific community in making hazard assessments. We correct classification and other spurious errors in earlier catalogues and posit a new list. We conclude that for some historic events, any assignment of magnitude, even on a six point intensity scale is inappropriate due to lack of information. Further we assert that any tsunami catalogue, including ours, can only be used in conjunction with sedimentologic evidence to quantitatively infer the return period of larger events. Statistical analyses correlating numbers of tsunami events derived solely from catalogues with their inferred or imagined intensities are meaningless, at least when focusing on specific locales where only a handful of tsunamis are known to have been historically reported. Quantitative hazard assessments based on scenario events of historic tsunamis for which -at best- only the size and approximate location of the parent earthquake is known should be undertaken with extreme caution and only with benefit of geologic studies to enhance the understanding of the local tectonics. Ambraseys N. (2008) Earthquakes in the Eastern Mediterranean and the Middle East: multidisciplinary study of 2000 years of seimicity, Cambridge Univ. Press, Cambridge (ISBN 9780521872928).
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort

NASA Astrophysics Data System (ADS)

Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.

2017-12-01

Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
Syntactic and Semantic Validation without a Metadata Management System

NASA Technical Reports Server (NTRS)

Pollack, Janine; Gokey, Christopher D.; Kendig, David; Olsen, Lola; Wharton, Stephen W. (Technical Monitor)

2001-01-01

The ability to maintain quality information is essential to securing the confidence in any system for which the information serves as a data source. NASA's Global Change Master Directory (GCMD), an online Earth science data locator, holds over 9000 data set descriptions and is in a constant state of flux as metadata are created and updated on a daily basis. In such a system, the importance of maintaining the consistency and integrity of these-metadata is crucial. The GCMD has developed a metadata management system utilizing XML, controlled vocabulary, and Java technologies to ensure the metadata not only adhere to valid syntax, but also exhibit proper semantics.
International Metadata Initiatives: Lessons in Bibliographic Control.

ERIC Educational Resources Information Center

Caplan, Priscilla

This paper looks at a subset of metadata schemes, including the Text Encoding Initiative (TEI) header, the Encoded Archival Description (EAD), the Dublin Core Metadata Element Set (DCMES), and the Visual Resources Association (VRA) Core Categories for visual resources. It examines why they developed as they did, major point of difference from…
A Generic Metadata Editor Supporting System Using Drupal CMS

NASA Astrophysics Data System (ADS)

Pan, J.; Banks, N. G.; Leggott, M.

2011-12-01

Metadata handling is a key factor in preserving and reusing scientific data. In recent years, standardized structural metadata has become widely used in Geoscience communities. However, there exist many different standards in Geosciences, such as the current version of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC CSDGM), the Ecological Markup Language (EML), the Geography Markup Language (GML), and the emerging ISO 19115 and related standards. In addition, there are many different subsets within the Geoscience subdomain such as the Biological Profile of the FGDC (CSDGM), or for geopolitical regions, such as the European Profile or the North American Profile in the ISO standards. It is therefore desirable to have a software foundation to support metadata creation and editing for multiple standards and profiles, without re-inventing the wheels. We have developed a software module as a generic, flexible software system to do just that: to facilitate the support for multiple metadata standards and profiles. The software consists of a set of modules for the Drupal Content Management System (CMS), with minimal inter-dependencies to other Drupal modules. There are two steps in using the system's metadata functions. First, an administrator can use the system to design a user form, based on an XML schema and its instances. The form definition is named and stored in the Drupal database as a XML blob content. Second, users in an editor role can then use the persisted XML definition to render an actual metadata entry form, for creating or editing a metadata record. Behind the scenes, the form definition XML is transformed into a PHP array, which is then rendered via Drupal Form API. When the form is submitted the posted values are used to modify a metadata record. Drupal hooks can be used to perform custom processing on metadata record before and after submission. It is trivial to store the metadata record as an actual XML file
Metadata management and semantics in microarray repositories.

PubMed

Kocabaş, F; Can, T; Baykal, N

2011-12-01

The number of microarray and other high-throughput experiments on primary repositories keeps increasing as do the size and complexity of the results in response to biomedical investigations. Initiatives have been started on standardization of content, object model, exchange format and ontology. However, there are backlogs and inability to exchange data between microarray repositories, which indicate that there is a great need for a standard format and data management. We have introduced a metadata framework that includes a metadata card and semantic nets that make experimental results visible, understandable and usable. These are encoded in syntax encoding schemes and represented in RDF (Resource Description Frame-word), can be integrated with other metadata cards and semantic nets, and can be exchanged, shared and queried. We demonstrated the performance and potential benefits through a case study on a selected microarray repository. We concluded that the backlogs can be reduced and that exchange of information and asking of knowledge discovery questions can become possible with the use of this metadata framework.
Metadata Design in the New PDS4 Standards - Something for Everybody

NASA Astrophysics Data System (ADS)

Raugh, Anne C.; Hughes, John S.

2015-11-01

The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data
Metadata squared: enhancing its usability for volunteered geographic information and the GeoWeb

USGS Publications Warehouse

Poore, Barbara S.; Wolf, Eric B.; Sui, Daniel Z.; Elwood, Sarah; Goodchild, Michael F.

2013-01-01

The Internet has brought many changes to the way geographic information is created and shared. One aspect that has not changed is metadata. Static spatial data quality descriptions were standardized in the mid-1990s and cannot accommodate the current climate of data creation where nonexperts are using mobile phones and other location-based devices on a continuous basis to contribute data to Internet mapping platforms. The usability of standard geospatial metadata is being questioned by academics and neogeographers alike. This chapter analyzes current discussions of metadata to demonstrate how the media shift that is occurring has affected requirements for metadata. Two case studies of metadata use are presented—online sharing of environmental information through a regional spatial data infrastructure in the early 2000s, and new types of metadata that are being used today in OpenStreetMap, a map of the world created entirely by volunteers. Changes in metadata requirements are examined for usability, the ease with which metadata supports coproduction of data by communities of users, how metadata enhances findability, and how the relationship between metadata and data has changed. We argue that traditional metadata associated with spatial data infrastructures is inadequate and suggest several research avenues to make this type of metadata more interactive and effective in the GeoWeb.
The Importance of Metadata in System Development and IKM

DTIC Science & Technology

2003-02-01

Defence R& D Canada The Importance of Metadata in System Development and IKM Anthony W. Isenor Technical Memorandum DRDC Atlantic TM 2003-011...Metadata in System Development and IKM Anthony W. Isenor Defence R& D Canada – Atlantic Technical Memorandum DRDC Atlantic TM 2003-011 February... it is important for searches and providing relevant information to the client. A comparison of metadata standards was conducted with emphasis on
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses

PubMed Central

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.

PubMed

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
The second Quito astrolabe catalogue

NASA Astrophysics Data System (ADS)

Kolesnik, Y. B.; Davila, H.

1994-03-01

The paper contains 515 individual corrections {DELTA}α and 235 corrections {DELTA}δ to FK5 and FK5Supp. stars and 50 corrections to their proper motions computed from observations made with the classical Danjon astrolabe OPL-13 at Quito Astronomical Observatory of Ecuador National Polytechnical School during a period from 1964 to 1983. These corrections cover the declination zone from -30deg to +30deg. Mean probable errors of catalogue positions are 0.047" in αcosδ and 0.054" in δ. The systematic trends of the catalogue {DELTA}αalpha_cosδ, {DELTA}αdelta_cosδ, {DELTA}δalpha_, {DELTA}δdelta_ are presented for the observed zone.
Metadata, PICS and Quality.

ERIC Educational Resources Information Center

Armstrong, C. J.

1997-01-01

Discusses PICS (Platform for Internet Content Selection), the Centre for Information Quality Management (CIQM), and metadata. Highlights include filtering networked information; the quality of information; and standardizing search engines. (LRW)
The Role of Metadata Standards in EOSDIS Search and Retrieval Applications

NASA Technical Reports Server (NTRS)

Pfister, Robin

1999-01-01

Metadata standards play a critical role in data search and retrieval systems. Metadata tie software to data so the data can be processed, stored, searched, retrieved and distributed. Without metadata these actions are not possible. The process of populating metadata to describe science data is an important service to the end user community so that a user who is unfamiliar with the data, can easily find and learn about a particular dataset before an order decision is made. Once a good set of standards are in place, the accuracy with which data search can be performed depends on the degree to which metadata standards are adhered during product definition. NASA's Earth Observing System Data and Information System (EOSDIS) provides examples of how metadata standards are used in data search and retrieval.

The European-Mediterranean Earthquake Catalogue (EMEC) for the last millennium

NASA Astrophysics Data System (ADS)

Grünthal, Gottfried; Wahlström, Rutger

2012-07-01

The catalogue by Grünthal et al. (J Seismol 13:517-541, 2009a) of earthquakes in central, northern, and north-western Europe with M w ≥ 3.5 (CENEC) has been expanded to cover also southern Europe and the Mediterranean area. It has also been extended in time (1000-2006). Due to the strongly increased seismicity in the new area, the threshold for events south of the latitude 44°N has here been set at M w ≥ 4.0, keeping the lower threshold in the northern catalogue part. This part has been updated with data from new and revised national and regional catalogues. The new Euro-Mediterranean Earthquake Catalogue (EMEC) is based on data from some 80 domestic catalogues and data files and over 100 special studies. Available original M w and M 0 data have been introduced. The analysis largely followed the lines of the Grünthal et al. (J Seismol 13:517-541, 2009a) study, i.e., fake and duplicate events were identified and removed, polygons were specified within each of which one or more of the catalogues or data files have validity, and existing magnitudes and intensities were converted to M w. Algorithms to compute M w are based on relations provided locally, or more commonly on those derived by Grünthal et al. (J Seismol 13:517-541, 2009a) or in the present study. The homogeneity of EMEC with respect to M w for the different constituents was investigated and improved where feasible. EMEC contains entries of some 45,000 earthquakes. For each event, the date, time, location (including focal depth if available), intensity I 0 (if given in the original catalogue), magnitude M w (with uncertainty when given), and source (catalogue or special study) are presented. Besides the main EMEC catalogue, large events before year 1000 in the SE part of the investigated area and fake events, respectively, are given in separate lists.
Metadata Creation, Management and Search System for your Scientific Data

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Palanisamy, G.

2012-12-01

Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);
Distributed metadata in a high performance computing environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Zhang, Zhenhua

A computer-executable method, system, and computer program product for managing meta-data in a distributed storage system, wherein the distributed storage system includes one or more burst buffers enabled to operate with a distributed key-value store, the co computer-executable method, system, and computer program product comprising receiving a request for meta-data associated with a block of data stored in a first burst buffer of the one or more burst buffers in the distributed storage system, wherein the meta data is associated with a key-value, determining which of the one or more burst buffers stores the requested metadata, and upon determination thatmore » a first burst buffer of the one or more burst buffers stores the requested metadata, locating the key-value in a portion of the distributed key-value store accessible from the first burst buffer.« less
Fast and accurate mock catalogue generation for low-mass galaxies

NASA Astrophysics Data System (ADS)

Koda, Jun; Blake, Chris; Beutler, Florian; Kazin, Eyal; Marin, Felipe

2016-06-01

We present an accurate and fast framework for generating mock catalogues including low-mass haloes, based on an implementation of the COmoving Lagrangian Acceleration (COLA) technique. Multiple realisations of mock catalogues are crucial for analyses of large-scale structure, but conventional N-body simulations are too computationally expensive for the production of thousands of realizations. We show that COLA simulations can produce accurate mock catalogues with a moderate computation resource for low- to intermediate-mass galaxies in 1012 M⊙ haloes, both in real and redshift space. COLA simulations have accurate peculiar velocities, without systematic errors in the velocity power spectra for k ≤ 0.15 h Mpc-1, and with only 3-per cent error for k ≤ 0.2 h Mpc-1. We use COLA with 10 time steps and a Halo Occupation Distribution to produce 600 mock galaxy catalogues of the WiggleZ Dark Energy Survey. Our parallelized code for efficient generation of accurate halo catalogues is publicly available at github.com/junkoda/cola_halo.
Survey data and metadata modelling using document-oriented NoSQL

NASA Astrophysics Data System (ADS)

Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.

2018-03-01

Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
Development of health information search engine based on metadata and ontology.

PubMed

Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

2014-04-01

The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
In Interactive, Web-Based Approach to Metadata Authoring

NASA Technical Reports Server (NTRS)

Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

2001-01-01

NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools
Collaborative Metadata Curation in Support of NASA Earth Science Data Stewardship

NASA Technical Reports Server (NTRS)

Sisco, Adam W.; Bugbee, Kaylin; le Roux, Jeanne; Staton, Patrick; Freitag, Brian; Dixon, Valerie

2018-01-01

Growing collection of NASA Earth science data is archived and distributed by EOSDIS’s 12 Distributed Active Archive Centers (DAACs). Each collection and granule is described by a metadata record housed in the Common Metadata Repository (CMR). Multiple metadata standards are in use, and core elements of each are mapped to and from a common model – the Unified Metadata Model (UMM). Work done by the Analysis and Review of CMR (ARC) Team.
A model for enhancing Internet medical document retrieval with "medical core metadata".

PubMed

Malet, G; Munoz, F; Appleyard, R; Hersh, W

1999-01-01

Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
Metadata: Standards for Retrieving WWW Documents (and Other Digitized and Non-Digitized Resources)

NASA Astrophysics Data System (ADS)

Rusch-Feja, Diann

The use of metadata for indexing digitized and non-digitized resources for resource discovery in a networked environment is being increasingly implemented all over the world. Greater precision is achieved using metadata than relying on universal search engines and furthermore, meta-data can be used as filtering mechanisms for search results. An overview of various metadata sets is given, followed by a more focussed presentation of Dublin Core Metadata including examples of sub-elements and qualifiers. Especially the use of the Dublin Core Relation element provides connections between the metadata of various related electronic resources, as well as the metadata for physical, non-digitized resources. This facilitates more comprehensive search results without losing precision and brings together different genres of information which would otherwise be only searchable in separate databases. Furthermore, the advantages of Dublin Core Metadata in comparison with library cataloging and the use of universal search engines are discussed briefly, followed by a listing of types of implementation of Dublin Core Metadata.
DIRAC File Replica and Metadata Catalog

NASA Astrophysics Data System (ADS)

Tsaregorodtsev, A.; Poss, S.

2012-12-01

File replica and metadata catalogs are essential parts of any distributed data management system, which are largely determining its functionality and performance. A new File Catalog (DFC) was developed in the framework of the DIRAC Project that combines both replica and metadata catalog functionality. The DFC design is based on the practical experience with the data management system of the LHCb Collaboration. It is optimized for the most common patterns of the catalog usage in order to achieve maximum performance from the user perspective. The DFC supports bulk operations for replica queries and allows quick analysis of the storage usage globally and for each Storage Element separately. It supports flexible ACL rules with plug-ins for various policies that can be adopted by a particular community. The DFC catalog allows to store various types of metadata associated with files and directories and to perform efficient queries for the data based on complex metadata combinations. Definition of file ancestor-descendent relation chains is also possible. The DFC catalog is implemented in the general DIRAC distributed computing framework following the standard grid security architecture. In this paper we describe the design of the DFC and its implementation details. The performance measurements are compared with other grid file catalog implementations. The experience of the DFC Catalog usage in the CLIC detector project are discussed.
Metadata Evaluation and Improvement: Evolving Analysis and Reporting

NASA Technical Reports Server (NTRS)

Habermann, Ted; Kozimor, John; Gordon, Sean

2017-01-01

ESIP Community members create and manage a large collection of environmental datasets that span multiple decades, the entire globe, and many parts of the solar system. Metadata are critical for discovering, accessing, using and understanding these data effectively and ESIP community members have successfully created large collections of metadata describing these data. As part of the White House Big Earth Data Initiative (BEDI), ESDIS has developed a suite of tools for evaluating these metadata in native dialects with respect to recommendations from many organizations. We will describe those tools and demonstrate evolving techniques for sharing results with data providers.
Omics Metadata Management Software v. 1 (OMMS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and to perform bioinformatics analyses and information management tasks via a simple and intuitive web-based interface. Several use cases with short-read sequence datasets are provided to showcase the full functionality of the OMMS, from metadata curation tasks, to bioinformatics analyses and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for web-based deployment supporting geographically dispersed research teams. Our software was developed with open-source bundles, is flexible, extensible and easily installedmore » and run by operators with general system administration and scripting language literacy.« less
VIRAC: the VVV Infrared Astrometric Catalogue

NASA Astrophysics Data System (ADS)

Smith, L. C.; Lucas, P. W.; Kurtev, R.; Smart, R.; Minniti, D.; Borissova, J.; Jones, H. R. A.; Zhang, Z. H.; Marocco, F.; Contreras Peña, C.; Gromadzki, M.; Kuhn, M. A.; Drew, J. E.; Pinfield, D. J.; Bedin, L. R.

2018-02-01

We present VIRAC version 1, a near-infrared proper motion and parallax catalogue of the VISTA Variables in the Via Lactea (VVV) survey for 312 587 642 unique sources averaged across all overlapping pawprint and tile images covering 560 deg2 of the bulge of the Milky Way and southern disc. The catalogue includes 119 million high-quality proper motion measurements, of which 47 million have statistical uncertainties below 1 mas yr-1. In the 11 < Ks < 14 magnitude range, the high-quality motions have a median uncertainty of 0.67 mas yr-1. The catalogue also includes 6935 sources with quality-controlled 5σ parallaxes with a median uncertainty of 1.1 mas. The parallaxes show reasonable agreement with the Tycho-Gaia Astrometric Solution, though caution is advised for data with modest significance. The SQL data base housing the data is made available via the web. We give example applications for studies of Galactic structure, nearby objects (low-mass stars and brown dwarfs, subdwarfs, white dwarfs) and kinematic distance measurements of young stellar objects. Nearby objects discovered include LTT 7251 B, an L7 benchmark companion to a G dwarf with over 20 published elemental abundances, a bright L subdwarf, VVV 1256-6202, with extremely blue colours and nine new members of the 25 pc sample. We also demonstrate why this catalogue remains useful in the era of Gaia. Future versions will be based on profile fitting photometry, use the Gaia absolute reference frame and incorporate the longer time baseline of the VVV extended survey.
Planck 2015 results. XXVIII. The Planck Catalogue of Galactic cold clumps

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Battaner, E.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Boulanger, F.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Catalano, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Helou, G.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Marshall, D. J.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; Mazzotta, P.; McGehee, P.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paladini, R.; Paoletti, D.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Pelkonen, V.-M.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Reach, W. T.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zonca, A.

2016-09-01

We present the Planck Catalogue of Galactic Cold Clumps (PGCC), an all-sky catalogue of Galactic cold clump candidates detected by Planck. This catalogue is the full version of the Early Cold Core (ECC) catalogue, which was made available in 2011 with the Early Release Compact Source Catalogue (ERCSC) and which contained 915 high signal-to-noise sources. It is based on the Planck 48-month mission data that are currently being released to the astronomical community. The PGCC catalogue is an observational catalogue consisting exclusively of Galactic cold sources. The three highest Planck bands (857, 454, and 353 GHz) have been combined with IRAS data at 3 THz to perform a multi-frequency detection of sources colder than their local environment. After rejection of possible extragalactic contaminants, the PGCC catalogue contains 13188 Galactic sources spread across the whole sky, I.e., from the Galactic plane to high latitudes, following the spatial distribution of the main molecular cloud complexes. The median temperature of PGCC sources lies between 13 and 14.5 K, depending on the quality of the flux density measurements, with a temperature ranging from 5.8 to 20 K after removing the sources with the top 1% highest temperature estimates. Using seven independent methods, reliable distance estimates have been obtained for 5574 sources, which allows us to derive their physical properties such as their mass, physical size, mean density, and luminosity.The PGCC sources are located mainly in the solar neighbourhood, but also up to a distance of 10.5 kpc in the direction of the Galactic centre, and range from low-mass cores to large molecular clouds. Because of this diversity and because the PGCC catalogue contains sources in very different environments, the catalogue is useful for investigating the evolution from molecular clouds to cores. Finally, it also includes 54 additional sources located in the Small and Large Magellanic Clouds.
Planck 2015 results: XXVIII. The Planck Catalogue of Galactic cold clumps

DOE PAGES

Ade, P. A. R.; Aghanim, N.; Arnaud, M.; ...

2016-09-20

Here, we present the Planck Catalogue of Galactic Cold Clumps (PGCC), an all-sky catalogue of Galactic cold clump candidates detected by Planck. This catalogue is the full version of the Early Cold Core (ECC) catalogue, which was made available in 2011 with the Early Release Compact Source Catalogue (ERCSC) and which contained 915 high signal-to-noise sources. It is based on the Planck 48-month mission data that are currently being released to the astronomical community. The PGCC catalogue is an observational catalogue consisting exclusively of Galactic cold sources. The three highest Planck bands (857, 454, and 353 GHz) have been combinedmore » with IRAS data at 3 THz to perform a multi-frequency detection of sources colder than their local environment. After rejection of possible extragalactic contaminants, the PGCC catalogue contains 13188 Galactic sources spread across the whole sky, i.e., from the Galactic plane to high latitudes, following the spatial distribution of the main molecular cloud complexes. The median temperature of PGCC sources lies between 13 and 14.5 K, depending on the quality of the flux density measurements, with a temperature ranging from 5.8 to 20 K after removing the sources with the top 1% highest temperature estimates. Using seven independent methods, reliable distance estimates have been obtained for 5574 sources, which allows us to derive their physical properties such as their mass, physical size, mean density, and luminosity.The PGCC sources are located mainly in the solar neighbourhood, but also up to a distance of 10.5 kpc in the direction of the Galactic centre, and range from low-mass cores to large molecular clouds. Because of this diversity and because the PGCC catalogue contains sources in very different environments, the catalogue is useful for investigating the evolution from molecular clouds to cores. Finally, it also includes 54 additional sources located in the Small and Large Magellanic Clouds.« less
Forum Guide to Metadata: The Meaning behind Education Data. NFES 2009-805

ERIC Educational Resources Information Center

National Forum on Education Statistics, 2009

2009-01-01

The purpose of this guide is to empower people to more effectively use data as information. To accomplish this, the publication explains what metadata are; why metadata are critical to the development of sound education data systems; what components comprise a metadata system; what value metadata bring to data management and use; and how to…
Study of the star catalogue (epoch AD 1396.0) recorded in ancient Korean astronomical almanac

NASA Astrophysics Data System (ADS)

Jeon, Junhyeok; Lee, Yong Bok; Lee, Yong-Sam

2015-11-01

The study of old star catalogues provides important astrometric data. Most of the researches based on the old star catalogues were manuscript published in Europe and from Arabic/Islam. However, the old star catalogues published in East Asia did not get attention. Therefore, among the East Asian star catalogues we focus on a particular catalogue recorded in a Korean almanac. Its catalogue contains 277 stars that are positioned in a region within 10° of the ecliptic plane. The stars in the catalogue were identified using the modern Hipparcos catalogue. We identified 274 among 277 stars, which is a rate of 98.9 per cent. The catalogue records the epoch of the stars' positions as AD 1396.0. However, by using all of the identified stars we found that the initial epoch of the catalogue is AD 1363.1 ± 3.2. In conclusion, the star catalogue was compiled and edited from various older star catalogues. We assume a correlation with the Almagest by Ptolemaios. This study presents newly analysed results from the historically important astronomical data discovered in East Asia. Therefore, this star catalogue will become important data for comparison with the star catalogues published in Europe and from Arabic/Islam.
EPA Metadata Style Guide Keywords and EPA Organization Names

EPA Pesticide Factsheets

The following keywords and EPA organization names listed below, along with EPA’s Metadata Style Guide, are intended to provide suggestions and guidance to assist with the standardization of metadata records.
A catalogue of AKARI FIS BSC extragalactic objects

NASA Astrophysics Data System (ADS)

Marton, Gabor; Toth, L. Viktor; Gyorgy Balazs, Lajos

2015-08-01

We combined photometric data of about 70 thousand point sources from the AKARI Far-Infrared Surveyor Bright Source Catalogue with AllWISE catalogue data to identify galaxies. We used Quadratic Discriminant Analysis (QDA) to classify our sources. The classification was based on a 6D parameter space that contained AKARI [F65/F90], [F90/F140], [F140/F160] and WISE W1-W2 colours along with WISE W1 magnitudes and AKARI [F140] flux values. Sources were classified into 3 main objects types: YSO candidates, evolved stars and galaxies. The training samples were SIMBAD entries of the input point sources wherever an associated SIMBAD object was found within a 30 arcsecond search radius. The QDA resulted more than 5000 AKARI galaxy candidate sources. The selection was tested cross-correlating our AKARI extragalactic catalogue with the Revised IRAS-FSC Redshift Catalogue (RIFSCz). A very good match was found. A further classification attempt was also made to differentiate between extragalactic subtypes using Support Vector Machines (SVMs). The results of the various methods showed that we can confidently separate cirrus dominated objects (type 1 of RIFSCz). Some of our “galaxy candidate” sources are associated with 2MASS extended objects, and listed in the NASA Extragalactic Database so far without clear proofs of their extragalactic nature. Examples will be presented in our poster. Finally other AKARI extragalactic catalogues will be also compared to our statistical selection.

Catalogue of knowledge and skills for sleep medicine.

PubMed

Penzel, Thomas; Pevernagie, Dirk; Dogas, Zoran; Grote, Ludger; de Lacy, Simone; Rodenbeck, Andrea; Bassetti, Claudio; Berg, Søren; Cirignotta, Fabio; d'Ortho, Marie-Pia; Garcia-Borreguero, Diego; Levy, Patrick; Nobili, Lino; Paiva, Teresa; Peigneux, Philippe; Pollmächer, Thomas; Riemann, Dieter; Skene, Debra J; Zucconi, Marco; Espie, Colin

2014-04-01

Sleep medicine is evolving globally into a medical subspeciality in its own right, and in parallel, behavioural sleep medicine and sleep technology are expanding rapidly. Educational programmes are being implemented at different levels in many European countries. However, these programmes would benefit from a common, interdisciplinary curriculum. This 'catalogue of knowledge and skills' for sleep medicine is proposed, therefore, as a template for developing more standardized curricula across Europe. The Board and The Sleep Medicine Committee of the European Sleep Research Society (ESRS) have compiled the catalogue based on textbooks, standard of practice publications, systematic reviews and professional experience, validated subsequently by an online survey completed by 110 delegates specialized in sleep medicine from different European countries. The catalogue comprises 10 chapters covering physiology, pathology, diagnostic and treatment procedures to societal and organizational aspects of sleep medicine. Required levels of knowledge and skills are defined, as is a proposed workload of 60 points according to the European Credit Transfer System (ECTS). The catalogue is intended to be a basis for sleep medicine education, for sleep medicine courses and for sleep medicine examinations, serving not only physicians with a medical speciality degree, but also PhD and MSc health professionals such as clinical psychologists and scientists, technologists and nurses, all of whom may be involved professionally in sleep medicine. In the future, the catalogue will be revised in accordance with advances in the field of sleep medicine. © 2013 European Sleep Research Society.
Evaluating and Improving Metadata for Data Use and Understanding

NASA Astrophysics Data System (ADS)

Habermann, T.

2013-12-01

The last several decades have seen an extraordinary increase in the number and breadth of environmental data available to the scientific community and the general public. These increases have focused the environmental data community on creating metadata for discovering data and on the creation and population of catalogs and portals for facilitating discovery. This focus is reflected in the fields required by commonly used metadata standards and has resulted in collections populated with metadata that meet, but don't go far beyond, minimal discovery requirements. Discovery is the first step towards addressing scientific questions using data. As more data are discovered and accessed, users need metadata that 1) automates use and integration of these data in tools and 2) facilitates understanding the data when it is compared to similar datasets or as internal variations are observed. When data discovery is the primary goal, it is important to create records for as many datasets as possible. The content of these records is controlled by minimum requirements, and evaluation is generally limited to testing for required fields and counting records. As the use and understanding needs become more important, more comprehensive evaluation tools are needed. An approach is described for evaluating existing metadata in the light of these new requirements and for improving the metadata to meet them.
The SuperCOSMOS all-sky galaxy catalogue

NASA Astrophysics Data System (ADS)

Peacock, J. A.; Hambly, N. C.; Bilicki, M.; MacGillivray, H. T.; Miller, L.; Read, M. A.; Tritton, S. B.

2016-10-01

We describe the construction of an all-sky galaxy catalogue, using SuperCOSMOS scans of Schmidt photographic plates from the UK Schmidt Telescope and Second Palomar Observatory Sky Survey. The photographic photometry is calibrated using Sloan Digital Sky Survey data, with results that are linear to 2 per cent or better. All-sky photometric uniformity is achieved by matching plate overlaps and also by requiring homogeneity in optical-to-2MASS colours, yielding zero-points that are uniform to 0.03 mag or better. The typical AB depths achieved are BJ < 21, RF < 19.5 and IN < 18.5, with little difference between hemispheres. In practice, the IN plates are shallower than the BJ and RF plates, so for most purposes we advocate the use of a catalogue selected in these two latter bands. At high Galactic latitudes, this catalogue is approximately 90 per cent complete with 5 per cent stellar contamination; we quantify how the quality degrades towards the Galactic plane. At low latitudes, there are many spurious galaxy candidates resulting from stellar blends: these approximately match the surface density of true galaxies at |b| = 30°. Above this latitude, the catalogue limited in BJ and RF contains in total about 20 million galaxy candidates, of which 75 per cent are real. This contamination can be removed, and the sky coverage extended, by matching with additional data sets. This SuperCOSMOS catalogue has been matched with 2MASS and with WISE, yielding quasi-all-sky samples of respectively 1.5 million and 18.5 million galaxies, to median redshifts of 0.08 and 0.20. This legacy data set thus continues to offer a valuable resource for large-angle cosmological investigations.
Managing biomedical image metadata for search and retrieval of similar images.

PubMed

Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris

2011-08-01

Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

NASA Astrophysics Data System (ADS)

Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

2013-12-01

The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

NASA Astrophysics Data System (ADS)

Jiang, Y.

2015-12-01

Oceanographic resource discovery is a critical step for developing ocean science applications. With the increasing number of resources available online, many Spatial Data Infrastructure (SDI) components (e.g. catalogues and portals) have been developed to help manage and discover oceanographic resources. However, efficient and accurate resource discovery is still a big challenge because of the lack of data relevancy information. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, usage metrics, and user feedback. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the propose engine helps both scientists and general users search for more accurate results with enhanced performance and user experience through a user-friendly interface.
106-17 Telemetry Standards Metadata Configuration Chapter 23

DTIC Science & Technology

2017-07-01

23-1 23.2 Metadata Description Language ...Chapter 23, July 2017 iii Acronyms HTML Hypertext Markup Language MDL Metadata Description Language PCM pulse code modulation TMATS Telemetry...Attributes Transfer Standard W3C World Wide Web Consortium XML eXtensible Markup Language XSD XML schema document Telemetry Network Standard
Development of Health Information Search Engine Based on Metadata and Ontology

PubMed Central

Song, Tae-Min; Jin, Dal-Lae

2014-01-01

Objectives The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Methods Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Conclusions Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers. PMID:24872907
FRAMES Metadata Reporting Templates for Ecohydrological Observations, version 1.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christianson, Danielle; Varadharajan, Charuleka; Christoffersen, Brad

FRAMES is a a set of Excel metadata files and package-level descriptive metadata that are designed to facilitate and improve capture of desired metadata for ecohydrological observations. The metadata are bundled with data files into a data package and submitted to a data repository (e.g. the NGEE Tropics Data Repository) via a web form. FRAMES standardizes reporting of diverse ecohydrological and biogeochemical data for synthesis across a range of spatiotemporal scales and incorporates many best data science practices. This version of FRAMES supports observations for primarily automated measurements collected by permanently located sensors, including sap flow (tree water use), leafmore » surface temperature, soil water content, dendrometry (stem diameter growth increment), and solar radiation. Version 1.1 extend the controlled vocabulary and incorporates functionality to facilitate programmatic use of data and FRAMES metadata (R code available at NGEE Tropics Data Repository).« less
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal.

PubMed

Baker, Ed

2013-01-01

Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

PubMed Central

2013-01-01

Abstract Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping. PMID:24723768
Catalogue of Palaearctic Coleoptera

USDA-ARS?s Scientific Manuscript database

The palaearctic weevils of the subfamily Baridinae are catalogued. Two subgenera are raised to full generic rank, 12 genera are transferred from incertae sedis to tribes and subtribes, 25 species names are transferred to other genera and nine are synonymized. A total of 215 primary references were c...
Remapping dark matter halo catalogues between cosmological simulations

NASA Astrophysics Data System (ADS)

Mead, A. J.; Peacock, J. A.

2014-05-01

We present and test a method for modifying the catalogue of dark matter haloes produced from a given cosmological simulation, so that it resembles the result of a simulation with an entirely different set of parameters. This extends the method of Angulo & White, which rescales the full particle distribution from a simulation. Working directly with the halo catalogue offers an advantage in speed, and also allows modifications of the internal structure of the haloes to account for non-linear differences between cosmologies. Our method can be used directly on a halo catalogue in a self-contained manner without any additional information about the overall density field; although the large-scale displacement field is required by the method, this can be inferred from the halo catalogue alone. We show proof of concept of our method by rescaling a matter-only simulation with no baryon acoustic oscillation (BAO) features to a more standard Λ cold dark matter model containing a cosmological constant and a BAO signal. In conjunction with the halo occupation approach, this method provides a basis for the rapid generation of mock galaxy samples spanning a wide range of cosmological parameters.
Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

PubMed

Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2017-08-01

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
An Approach to Information Management for AIR7000 with Metadata and Ontologies

DTIC Science & Technology

2009-10-01

metadata. We then propose an approach based on Semantic Technologies including the Resource Description Framework (RDF) and Upper Ontologies, for the...mandating specific metadata schemas can result in interoperability problems. For example, many standards within the ADO mandate the use of XML for metadata...such problems, we propose an archi- tecture in which different metadata schemes can inter operate. By using RDF (Resource Description Framework ) as a
Metadata Exporter for Scientific Photography Management

NASA Astrophysics Data System (ADS)

Staudigel, D.; English, B.; Delaney, R.; Staudigel, H.; Koppers, A.; Hart, S.

2005-12-01

Photographs have become an increasingly important medium, especially with the advent of digital cameras. It has become inexpensive to take photographs and quickly post them on a website. However informative photos may be, they still need to be displayed in a convenient way, and be cataloged in such a manner that makes them easily locatable. Managing the great number of photographs that digital cameras allow and creating a format for efficient dissemination of the information related to the photos is a tedious task. Products such as Apple's iPhoto have greatly eased the task of managing photographs, However, they often have limitations. Un-customizable metadata fields and poor metadata extraction tools limit their scientific usefulness. A solution to this persistent problem is a customizable metadata exporter. On the ALIA expedition, we successfully managed the thousands of digital photos we took. We did this with iPhoto and a version of the exporter that is now available to the public under the name "CustomHTMLExport" (http://www.versiontracker.com/dyn/moreinfo/macosx/27777), currently undergoing formal beta testing This software allows the use of customized metadata fields (including description, time, date, GPS data, etc.), which is exported along with the photo. It can also produce webpages with this data straight from iPhoto, in a much more flexible way than is already allowed. With this tool it becomes very easy to manage and distribute scientific photos.
Merge of Five Previous Catalogues Into the Ground Truth Catalogue and Registration Based on MOLA Data with THEMIS-DIR, MDIM and MOC Data-Sets

NASA Astrophysics Data System (ADS)

Salamuniccar, G.; Loncaric, S.

2008-03-01

The Catalogue from our previous work was merged with the date of Barlow, Rodionova, Boyce, and Kuzmin. The resulting ground truth catalogue with 57,633 craters was registered, using MOLA data, with THEMIS-DIR, MDIM, and MOC data-sets.
Creating FGDC and NBII metadata with Metavist 2005.

Treesearch

David J. Rugg

2004-01-01

This report documents a computer program for creating metadata compliant with the Federal Geographic Data Committee (FGDC) 1998 metadata standard or the National Biological Information Infrastructure (NBII) 1999 Biological Data Profile for the FGDC standard. The software runs under the Microsoft Windows 2000 and XP operating systems, and requires the presence of...
A Model for the Creation of Human-Generated Metadata within Communities

ERIC Educational Resources Information Center

Brasher, Andrew; McAndrew, Patrick

2005-01-01

This paper considers situations for which detailed metadata descriptions of learning resources are necessary, and focuses on human generation of such metadata. It describes a model which facilitates human production of good quality metadata by the development and use of structured vocabularies. Using examples, this model is applied to single and…
Towards a Next-Generation Catalogue Cross-Match Service

NASA Astrophysics Data System (ADS)

Pineau, F.; Boch, T.; Derriere, S.; Arches Consortium

2015-09-01

We have been developing in the past several catalogue cross-match tools. On one hand the CDS XMatch service (Pineau et al. 2011), able to perform basic but very efficient cross-matches, scalable to the largest catalogues on a single regular server. On the other hand, as part of the European project ARCHES1, we have been developing a generic and flexible tool which performs potentially complex multi-catalogue cross-matches and which computes probabilities of association based on a novel statistical framework. Although the two approaches have been managed so far as different tracks, the need for next generation cross-match services dealing with both efficiency and complexity is becoming pressing with forthcoming projects which will produce huge high quality catalogues. We are addressing this challenge which is both theoretical and technical. In ARCHES we generalize to N catalogues the candidate selection criteria - based on the chi-square distribution - described in Pineau et al. (2011). We formulate and test a number of Bayesian hypothesis which necessarily increases dramatically with the number of catalogues. To assign a probability to each hypotheses, we rely on estimated priors which account for local densities of sources. We validated our developments by comparing the theoretical curves we derived with the results of Monte-Carlo simulations. The current prototype is able to take into account heterogeneous positional errors, object extension and proper motion. The technical complexity is managed by OO programming design patterns and SQL-like functionalities. Large tasks are split into smaller independent pieces for scalability. Performances are achieved resorting to multi-threading, sequential reads and several tree data-structures. In addition to kd-trees, we account for heterogeneous positional errors and object's extension using M-trees. Proper-motions are supported using a modified M-tree we developed, inspired from Time Parametrized R-trees (TPR

Pragmatic Metadata Management for Integration into Multiple Spatial Data Infrastructure Systems and Platforms

NASA Astrophysics Data System (ADS)

Benedict, K. K.; Scott, S.

2013-12-01

While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into
Metadata and annotations for multi-scale electrophysiological data.

PubMed

Bower, Mark R; Stead, Matt; Brinkmann, Benjamin H; Dufendach, Kevin; Worrell, Gregory A

2009-01-01

The increasing use of high-frequency (kHz), long-duration (days) intracranial monitoring from multiple electrodes during pre-surgical evaluation for epilepsy produces large amounts of data that are challenging to store and maintain. Descriptive metadata and clinical annotations of these large data sets also pose challenges to simple, often manual, methods of data analysis. The problems of reliable communication of metadata and annotations between programs, the maintenance of the meanings within that information over long time periods, and the flexibility to re-sort data for analysis place differing demands on data structures and algorithms. Solutions to these individual problem domains (communication, storage and analysis) can be configured to provide easy translation and clarity across the domains. The Multi-scale Annotation Format (MAF) provides an integrated metadata and annotation environment that maximizes code reuse, minimizes error probability and encourages future changes by reducing the tendency to over-fit information technology solutions to current problems. An example of a graphical utility for generating and evaluating metadata and annotations for "big data" files is presented.
X-ray selected stars in HRC and BHRC catalogues

NASA Astrophysics Data System (ADS)

Mickaelian, A. M.; Paronyan, G. M.

2014-12-01

A joint HRC/BHRC Catalogue has been created based on merging of Hamburg ROSAT Catalogue (HRC) and Byurakan Hamburg ROSAT Catalogue (BHRC). Both have been made by optical identifications of X-ray sources based on low-dispersion spectra of the Hamburg Quasar Survey (HQS) using ROSAT Catalogues. As a result, the largest sample of 8132 (5341+2791) optically identified X-ray sources was created having count rate (CR) of photons ≤ 0.04 ct/s in the area with galactic latitudes |b|≤ 20° and declinations d≤ 0°.There are 4253 AGN, 492 galaxies, 1800 stars and 1587 unknown objects in the sample. All stars have been found in GSC 2.3.2, as well as most of them are in GALEX, USNO-B1.0, 2MASS and WISE catalogues. In addition, 1429 are in SDSS DR9 and 204 have SDSS spectra. For these stars we have carried out spectral classification and along with the bright stars, many new cataclysmic variables (CV), white dwarfs (WD) and late-type stars (K-M and C) have been revealed. For all stars, statistical studies of their multiwavelength properties have been made. An attempt to find a connection between the radiation fluxes in different bands for different types of sources, and identify their characteristics was made as well.
Catalogue of Tephritidae of Colombia

USDA-ARS?s Scientific Manuscript database

The present Catalogue includes 93 species and 23 genera of Tephritidae that have been recorded in Colombia. Four subfamilies (Blepharoneurinae, Dacinae, Trypetinae and Tephritinae), and eight tribes (Acrotaeniini, Carpomyini, Dacini, Eutretini, Myopitini, Noeetini, Tephritini, and Toxotrypanini) are...
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations

DOE PAGES

Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley; ...

2017-06-20

Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christianson, Danielle S.; Varadharajan, Charuleka; Christoffersen, Bradley

Metadata describe the ancillary information needed for data interpretation, comparison across heterogeneous datasets, and quality control and quality assessment (QA/QC). Metadata enable the synthesis of diverse ecohydrological and biogeochemical observations, an essential step in advancing a predictive understanding of earth systems. Environmental observations can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse observations collected in disparate field sites. However, existing metadata reporting protocols do not support the complex data synthesis needs of interdisciplinarymore » earth system research. We developed a metadata reporting framework (FRAMES) to enable predictive understanding of carbon cycling in tropical forests under global change. FRAMES adheres to best practices for data and metadata organization, enabling consistent data reporting and thus compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES. The resulting modular organization streamlines metadata reporting and can be expanded to incorporate additional data types. The flexible data reporting format incorporates existing field practices to maximize data-entry efficiency. With FRAMES’s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data providers and users. Here in this article, we describe FRAMES, identify lessons learned, and discuss areas of future development.« less
Content standards for medical image metadata

NASA Astrophysics Data System (ADS)

d'Ornellas, Marcos C.; da Rocha, Rafael P.

2003-12-01

Medical images are at the heart of the healthcare diagnostic procedures. They have provided not only a noninvasive mean to view anatomical cross-sections of internal organs but also a mean for physicians to evaluate the patient"s diagnosis and monitor the effects of the treatment. For a Medical Center, the emphasis may shift from the generation of image to post processing and data management since the medical staff may generate even more processed images and other data from the original image after various analyses and post processing. A medical image data repository for health care information system is becoming a critical need. This data repository would contain comprehensive patient records, including information such as clinical data and related diagnostic images, and post-processed images. Due to the large volume and complexity of the data as well as the diversified user access requirements, the implementation of the medical image archive system will be a complex and challenging task. This paper discusses content standards for medical image metadata. In addition it also focuses on the image metadata content evaluation and metadata quality management.
Identification of stars in a J1744.0 star catalogue Yixiangkaocheng

NASA Astrophysics Data System (ADS)

Ahn, S.-H.

2012-05-01

The stars in the Chinese star catalogue, Yixiangkaocheng, which were edited by the Jesuit astronomer Kögler in AD 1744 and published in AD 1756, are identified with their counterparts in the Hipparcos catalogue. The equinox of the catalogue is confirmed to be J1744.0. By considering the precession of equinox, proper motions and nutation, the star closest to the location of each star in Yixiangkaocheng, having a proper magnitude, is selected as the corresponding identified star. I identified 2848 stars and 13 nebulosities out of 3083 objects in Yixiangkaocheng, and so the identification rate reached 92.80 per cent. I find that the magnitude classification system in Yixiangkaocheng agrees with the modern magnitude system. The catalogue includes dim stars, whose visual magnitudes are larger than 7, but most of these stars have Flamsteed designations. I find that the stars whose declination is lower than -30° have relatively larger offsets and different systematic behaviour from other stars. This indicates that there might be two different sources of stars in Yixiangkaocheng. In particular, I find that μ1 Sco and γ1 Sgr approximately mark the boundary between two different source catalogues. The observer's location, as estimated from these facts, agrees with the latitude of Greenwich where Flamsteed made his observations. The positional offsets between the Yixiangkaocheng stars and the Hipparcos stars are 0.6 arcmin, which implies that the source catalogue of stars with δ > -30° must have come from telescopic observations. Nebulosities in Yixiangkaocheng are identified with a few double stars, o Cet (the variable star, Mira), the Andromeda galaxy, ω Cen and NGC6231. These entities are associated with listings in Halley's Catalogue of the Southern Stars of AD 1679 as well as Flamsteed's catalogue of AD 1690.
PIMMS tools for capturing metadata about simulations

NASA Astrophysics Data System (ADS)

Pascoe, Charlotte; Devine, Gerard; Tourte, Gregory; Pascoe, Stephen; Lawrence, Bryan; Barjat, Hannah

2013-04-01

PIMMS (Portable Infrastructure for the Metafor Metadata System) provides a method for consistent and comprehensive documentation of modelling activities that enables the sharing of simulation data and model configuration information. The aim of PIMMS is to package the metadata infrastructure developed by Metafor for CMIP5 so that it can be used by climate modelling groups in UK Universities. PIMMS tools capture information about simulations from the design of experiments to the implementation of experiments via simulations that run models. PIMMS uses the Metafor methodology which consists of a Common Information Model (CIM), Controlled Vocabularies (CV) and software tools. PIMMS software tools provide for the creation and consumption of CIM content via a web services infrastructure and portal developed by the ES-DOC community. PIMMS metadata integrates with the ESGF data infrastructure via the mapping of vocabularies onto ESGF facets. There are three paradigms of PIMMS metadata collection: Model Intercomparision Projects (MIPs) where a standard set of questions is asked of all models which perform standard sets of experiments. Disciplinary level metadata collection where a standard set of questions is asked of all models but experiments are specified by users. Bespoke metadata creation where the users define questions about both models and experiments. Examples will be shown of how PIMMS has been configured to suit each of these three paradigms. In each case PIMMS allows users to provide additional metadata beyond that which is asked for in an initial deployment. The primary target for PIMMS is the UK climate modelling community where it is common practice to reuse model configurations from other researchers. This culture of collaboration exists in part because climate models are very complex with many variables that can be modified. Therefore it has become common practice to begin a series of experiments by using another climate model configuration as a starting
Federating Metadata Catalogs

NASA Astrophysics Data System (ADS)

Baru, C.; Lin, K.

2009-04-01

The Geosciences Network project (www.geongrid.org) has been developing cyberinfrastructure for data sharing in the Earth Science community based on a service-oriented architecture. The project defines a standard "software stack", which includes a standardized set of software modules and corresponding service interfaces. The system employs Grid certificates for distributed user authentication. The GEON Portal provides online access to these services via a set of portlets. This service-oriented approach has enabled the GEON network to easily expand to new sites and deploy the same infrastructure in new projects. To facilitate interoperation with other distributed geoinformatics environments, service standards are being defined and implemented for catalog services and federated search across distributed catalogs. The need arises because there may be multiple metadata catalogs in a distributed system, for example, for each institution, agency, geographic region, and/or country. Ideally, a geoinformatics user should be able to search across all such catalogs by making a single search request. In this paper, we describe our implementation for such a search capability across federated metadata catalogs in the GEON service-oriented architecture. The GEON catalog can be searched using spatial, temporal, and other metadata-based search criteria. The search can be invoked as a Web service and, thus, can be imbedded in any software application. The need for federated catalogs in GEON arises because, (i) GEON collaborators at the University of Hyderabad, India have deployed their own catalog, as part of the iGEON-India effort, to register information about local resources for broader access across the network, (ii) GEON collaborators in the GEO Grid (Global Earth Observations Grid) project at AIST, Japan have implemented a catalog for their ASTER data products, and (iii) we have recently deployed a search service to access all data products from the EarthScope project in the US
Reviving legacy clay mineralogy data and metadata through the IEDA-CCNY Data Internship Program

NASA Astrophysics Data System (ADS)

Palumbo, R. V.; Randel, C.; Ismail, A.; Block, K. A.; Cai, Y.; Carter, M.; Hemming, S. R.; Lehnert, K.

2016-12-01

Reconstruction of past climate and ocean circulation using ocean sediment cores relies on the use of multiple climate proxies measured on well-studied cores. Preserving all the information collected on a sediment core is crucial for the success of future studies using these unique and important samples. Clay mineralogy is a powerful tool to study weathering processes and sedimentary provenance. In his pioneering dissertation, Pierre Biscaye (1964, Yale University) established the X-Ray Diffraction (XRD) method for quantitative clay mineralogy analyses in ocean sediments and presented data for 500 core-top samples throughout the Atlantic Ocean and its neighboring seas. Unfortunately, the data only exists in analog format, which has discouraged scientists from reusing the data, apart from replication of the published maps. Archiving and preserving this dataset and making it publicly available in a digital format, linked with the metadata from the core repository will allow the scientific community to use these data to generate new findings. Under the supervision of Sidney Hemming and members of the Interdisciplinary Earth Data Alliance (IEDA) team, IEDA-CCNY interns digitized the data and metadata from Biscaye's dissertation and linked them with additional sample metadata using IGSN (International Geo-Sample Number). After compilation and proper documentation of the dataset, it was published in the EarthChem Library where the dataset will be openly accessible, and citable with a persistent DOI (Digital Object Identifier). During this internship, the students read peer-reviewed articles, interacted with active scientists in the field and acquired knowledge about XRD methods and the data generated, as well as its applications. They also learned about existing and emerging best practices in data publication and preservation. Data rescue projects are a fun and interactive way for students to become engaged in the field.
Derivation of photometric redshifts for the 3XMM catalogue

NASA Astrophysics Data System (ADS)

Georgantopoulos, I.; Corral, A.; Mountrichas, G.; Ruiz, A.; Masoura, V.; Fotopoulou, S.; Watson, M.

2017-10-01

We present the results from our ESA Prodex project that aims to derive photometric redshifts for the 3XMM catalogue. The 3XMM DR-6 offers the largest X-ray survey, containing 470,000 unique sources over 1000 sq. degrees. We cross-correlate the X-ray positions with optical and near-IR catalogues using Bayesian statistics. The optical catalogue used so far is the SDSS while currently we are employing the recently released PANSTARRS catalogue. In the near IR we use the Viking, VHS, UKIDS surveys and also the WISE W1 and W2 filters. The estimation of photometric redshifts is based on the TPZ software. The training sample is based on X-ray selected samples with available SDSS spectroscopy. We present here the results for the 40,000 3XMM sources with available SDSS counterparts. Our analysis provides very reliable photometric redshifts with sigma(mad)=0.05 and a fraction of outliers of 8% for the optically extended sources. We discuss the wide range of applications that are feasible using this unprecedented resource.
[National Conference on Cataloguing Standards (Ottawa, May 19-20, 1970].

ERIC Educational Resources Information Center

National Library of Canada, Ottawa (Ontario).

The following papers were presented at an invitational conference on cataloging standards: (1) "Canadiana Meets Automation;" (2) "The Union Catalogues in the National Library - The Present Condition;" (3) "A Centralized Bibliographic Data Bank;" (4) "The Standardization of Cataloguing;" (5) "The…
Streamlining Metadata and Data Management for Evolving Digital Libraries

NASA Astrophysics Data System (ADS)

Clark, D.; Miller, S. P.; Peckman, U.; Smith, J.; Aerni, S.; Helly, J.; Sutton, D.; Chase, A.

2003-12-01

What began two years ago as an effort to stabilize the Scripps Institution of Oceanography (SIO) data archives from more than 700 cruises going back 50 years, has now become the operational fully-searchable "SIOExplorer" digital library, complete with thousands of historic photographs, images, maps, full text documents, binary data files, and 3D visualization experiences, totaling nearly 2 terabytes of digital content. Coping with data diversity and complexity has proven to be more challenging than dealing with large volumes of digital data. SIOExplorer has been built with scalability in mind, so that the addition of new data types and entire new collections may be accomplished with ease. It is a federated system, currently interoperating with three independent data-publishing authorities, each responsible for their own quality control, metadata specifications, and content selection. The IT architecture implemented at the San Diego Supercomputer Center (SDSC) streamlines the integration of additional projects in other disciplines with a suite of metadata management and collection building tools for "arbitrary digital objects." Metadata are automatically harvested from data files into domain-specific metadata blocks, and mapped into various specification standards as needed. Metadata can be browsed and objects can be viewed onscreen or downloaded for further analysis, with automatic proprietary-hold request management.
ALE: automated label extraction from GEO metadata.

PubMed

Giles, Cory B; Brown, Chase A; Ripperger, Michael; Dennis, Zane; Roopnarinesingh, Xiavan; Porter, Hunter; Perz, Aleksandra; Wren, Jonathan D

2017-12-28

NCBI's Gene Expression Omnibus (GEO) is a rich community resource containing millions of gene expression experiments from human, mouse, rat, and other model organisms. However, information about each experiment (metadata) is in the format of an open-ended, non-standardized textual description provided by the depositor. Thus, classification of experiments for meta-analysis by factors such as gender, age of the sample donor, and tissue of origin is not feasible without assigning labels to the experiments. Automated approaches are preferable for this, primarily because of the size and volume of the data to be processed, but also because it ensures standardization and consistency. While some of these labels can be extracted directly from the textual metadata, many of the data available do not contain explicit text informing the researcher about the age and gender of the subjects with the study. To bridge this gap, machine-learning methods can be trained to use the gene expression patterns associated with the text-derived labels to refine label-prediction confidence. Our analysis shows only 26% of metadata text contains information about gender and 21% about age. In order to ameliorate the lack of available labels for these data sets, we first extract labels from the textual metadata for each GEO RNA dataset and evaluate the performance against a gold standard of manually curated labels. We then use machine-learning methods to predict labels, based upon gene expression of the samples and compare this to the text-based method. Here we present an automated method to extract labels for age, gender, and tissue from textual metadata and GEO data using both a heuristic approach as well as machine learning. We show the two methods together improve accuracy of label assignment to GEO samples.
A document centric metadata registration tool constructing earth environmental data infrastructure

NASA Astrophysics Data System (ADS)

Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.

2009-12-01

DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool
Developing a prenatal nursing care International Classification for Nursing Practice catalogue.

PubMed

Liu, L; Coenen, A; Tao, H; Jansen, K R; Jiang, A L

2017-09-01

This study aimed to develop a prenatal nursing care catalogue of International Classification for Nursing Practice. As a programme of the International Council of Nurses, International Classification for Nursing Practice aims to support standardized electronic nursing documentation and facilitate collection of comparable nursing data across settings. This initiative enables the study of relationships among nursing diagnoses, nursing interventions and nursing outcomes for best practice, healthcare management decisions, and policy development. The catalogues are usually focused on target populations. Pregnant women are the nursing population addressed in this project. According to the guidelines for catalogue development, three research steps have been adopted: (a) identifying relevant nursing diagnoses, interventions and outcomes; (b) developing a conceptual framework for the catalogue; (c) expert's validation. This project established a prenatal nursing care catalogue with 228 terms in total, including 69 nursing diagnosis, 92 nursing interventions and 67 nursing outcomes, among them, 57 nursing terms were newly developed. All terms in the catalogue were organized by a framework with two main categories, i.e. Expected Changes of Pregnancy and Pregnancy at Risk. Each category had four domains, representing the physical, psychological, behavioral and environmental perspectives of nursing practice. This catalogue can ease the documentation workload among prenatal care nurses, and facilitate storage and retrieval of standardized data for many purposes, such as quality improvement, administration decision-support and researches. The documentations of prenatal care provided data that can be more fluently communicated, compared and evaluated across various healthcare providers and clinic settings. © 2016 International Council of Nurses.
Gaia DR1 documentation Chapter 7: Catalogue consolidation and validation

NASA Astrophysics Data System (ADS)

Arenou, F.; Babusiaux, C.; Blanco-Cuaresma, S.; Borrachero, R.; Cantat-Gaudin, T.; Fabricius, C.; Findeisen, K.; Helmi, A.; Hutton, A.; Luri, X.; Marrese, P.; Marinoni, S.; Marrese, P.; Robin, A.; Sordo, R.; Soria, S.; Turon, C.; Utrilla Molina, E.; Vallenari, A.

2017-12-01

The Gaia Catalogue does not only produce a wealth of data, it also represents a complex processing before a Catalogue can be issued. The main data processing is being handled by three DPAC Coordination Units, CU3 for the astrometric data, CU5 for the photometric data and CU6 for the spectroscopic data. Then three Coordination Units analyse the processed data, CU4 for optical or binary stars, solar system objects and extended objects, CU7 for variable stars, and CU8 for classification. Finally, CU9 takes care of the intermediate and final publication of the Gaia data. For Gaia DR1, the situation has been simplified in the sense that CU4, CU6 and CU8 did not contribute to the first Catalogue. At the last step, several data fields may have been computed by several Coordination Units (e.g., parallaxes computed by CU3, then again by CU4 with a fit of an astrometric + binary model if the star happens to have a significant binary motion; or a mean magnitude computed by CU5 may be superseded by another estimation from CU7 if the stars happens to be a periodic variable; etc.), in several Data Processing Centres, so an (a) homogeneous, (b) convenient, (c) consistent Catalogue has to be built. First, to a so-called CompleteSource is attached astrometric and photometric information, then possible variability information is integrated, producing an homogeneous Catalogue. Second, sources that do not meet some minimum astrometric or photometric quality are filtered out. The filters applied are described in Section 4 of Gaia Collaboration et al. (2016a). Third, while flat files are kept for further operations, the data is integrated inside the Gaia Archive Core System (GACS) database; crossmatch with external catalogues is also performed, providing the convenient access to the data. Fourth, the consistency of the Catalogue is obtained through a dedicated validation of its content. Sources that do not pass the validation criteria are then filtered out. This chapter describes these
HerMES: point source catalogues from Herschel-SPIRE observations II

NASA Astrophysics Data System (ADS)

Wang, L.; Viero, M.; Clarke, C.; Bock, J.; Buat, V.; Conley, A.; Farrah, D.; Guo, K.; Heinis, S.; Magdis, G.; Marchetti, L.; Marsden, G.; Norberg, P.; Oliver, S. J.; Page, M. J.; Roehlly, Y.; Roseboom, I. G.; Schulz, B.; Smith, A. J.; Vaccari, M.; Zemcov, M.

2014-11-01

The Herschel Multi-tiered Extragalactic Survey (HerMES) is the largest Guaranteed Time Key Programme on the Herschel Space Observatory. With a wedding cake survey strategy, it consists of nested fields with varying depth and area totalling ˜380 deg2. In this paper, we present deep point source catalogues extracted from Herschel-Spectral and Photometric Imaging Receiver (SPIRE) observations of all HerMES fields, except for the later addition of the 270 deg2 HerMES Large-Mode Survey (HeLMS) field. These catalogues constitute the second Data Release (DR2) made in 2013 October. A sub-set of these catalogues, which consists of bright sources extracted from Herschel-SPIRE observations completed by 2010 May 1 (covering ˜74 deg2) were released earlier in the first extensive data release in 2012 March. Two different methods are used to generate the point source catalogues, the SUSSEXTRACTOR point source extractor used in two earlier data releases (EDR and EDR2) and a new source detection and photometry method. The latter combines an iterative source detection algorithm, STARFINDER, and a De-blended SPIRE Photometry algorithm. We use end-to-end Herschel-SPIRE simulations with realistic number counts and clustering properties to characterize basic properties of the point source catalogues, such as the completeness, reliability, photometric and positional accuracy. Over 500 000 catalogue entries in HerMES fields (except HeLMS) are released to the public through the HeDAM (Herschel Database in Marseille) website (http://hedam.lam.fr/HerMES).
OlyMPUS - The Ontology-based Metadata Portal for Unified Semantics

NASA Astrophysics Data System (ADS)

Huffer, E.; Gleason, J. L.

2015-12-01

The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support data consumers and data providers, enabling the latter to register their data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS leverages the semantics and reasoning capabilities of ODISEES to provide data producers with a semi-automated interface for producing the semantically rich metadata needed to support ODISEES' data discovery and access services. It integrates the ODISEES metadata search system with multiple NASA data delivery tools to enable data consumers to create customized data sets for download to their computers, or for NASA Advanced Supercomputing (NAS) facility registered users, directly to NAS storage resources for access by applications running on NAS supercomputers. A core function of NASA's Earth Science Division is research and analysis that uses the full spectrum of data products available in NASA archives. Scientists need to perform complex analyses that identify correlations and non-obvious relationships across all types of Earth System phenomena. Comprehensive analytics are hindered, however, by the fact that many Earth science data products are disparate and hard to synthesize. Variations in how data are collected, processed, gridded, and stored, create challenges for data interoperability and synthesis, which are exacerbated by the sheer volume of available data. Robust, semantically rich metadata can support tools for data discovery and facilitate machine-to-machine transactions with services such as data subsetting, regridding, and reformatting. Such capabilities are critical to enabling the research activities integral to NASA's strategic plans. However, as metadata requirements increase and competing standards emerge

Metadata management for high content screening in OMERO

PubMed Central

Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K.; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J.; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R.

2016-01-01

High content screening (HCS) experiments create a classic data management challenge—multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of “final” results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. PMID:26476368
Metadata for data rescue and data at risk

USGS Publications Warehouse

Anderson, William L.; Faundeen, John L.; Greenberg, Jane; Taylor, Fraser

2011-01-01

Scientific data age, become stale, fall into disuse and run tremendous risks of being forgotten and lost. These problems can be addressed by archiving and managing scientific data over time, and establishing practices that facilitate data discovery and reuse. Metadata documentation is integral to this work and essential for measuring and assessing high priority data preservation cases. The International Council for Science: Committee on Data for Science and Technology (CODATA) has a newly appointed Data-at-Risk Task Group (DARTG), participating in the general arena of rescuing data. The DARTG primary objective is building an inventory of scientific data that are at risk of being lost forever. As part of this effort, the DARTG is testing an approach for documenting endangered datasets. The DARTG is developing a minimal and easy to use set of metadata properties for sufficiently describing endangered data, which will aid global data rescue missions. The DARTG metadata framework supports rapid capture, and easy documentation, across an array of scientific domains. This paper reports on the goals and principles supporting the DARTG metadata schema, and provides a description of the preliminary implementation.
A novel framework for assessing metadata quality in epidemiological and public health research settings

PubMed Central

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly. PMID:27570670
A novel framework for assessing metadata quality in epidemiological and public health research settings.

PubMed

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.
Catalogue of UV sources in the Galaxy

NASA Astrophysics Data System (ADS)

Beitia-Antero, L.; Gómez de Castro, A. I.

2017-03-01

The Galaxy Evolution Explorer (GALEX) ultraviolet (UV) database contains the largest photometric catalogue in the ultraviolet range; as a result GALEX photometric bands, Near UV band (NUV) and the Far UV band (FUV), have become standards. Nevertheless, the GALEX catalogue does not include bright UV sources due to the high sensitivity of its detectors, neither sources in the Galactic plane. In order to extend the GALEX database for future UV missions, we have obtained synthetic FUV and NUV photometry using the database of UV spectra generated by the International Ultraviolet Explorer (IUE). This database contains 63,755 spectra in the low dispersion mode (λ / δ λ ˜ 300) obtained during its 18-year lifetime. For stellar sources in the IUE database, we have selected spectra with high Signal-To-NoiseRatio (SNR) and computed FUV and NUV magnitudes using the GALEX transmission curves along with the conversion equations between flux and magnitudes provided by the mission. Besides, we have performed variability tests to determine whether the sources were variable (during the IUE observations). As a result, we have generated two different catalogues: one for non-variable stars and another one for variable sources. The former contains FUV and NUV magnitudes, while the latter gives the basic information and the FUV magnitude for each observation. The consistency of the magnitudes has been tested using White Dwarfs contained in both GALEX and IUE samples. The catalogues are available through the Centre des Donées Stellaires. The sources are distributed throughout the whole sky, with a special coverage of the Galactic plane.
The great contribution: Index Medicus, Index-Catalogue, and IndexCat

PubMed Central

Greenberg, Stephen J.; Gallagher, Patricia E.

2009-01-01

Objective: The systematic indexing of medical literature by the Library of the Surgeon-General's Office (now the National Library of Medicine) has been called “America's greatest contribution to medical knowledge.” In the 1870s, the library launched two indexes: the Index Medicus and the Index-Catalogue of the Library of the Surgeon-General's Office. Index Medicus is better remembered today as the forerunner of MEDLINE, but Index Medicus began as the junior partner of what the library saw as its major publication, the Index-Catalogue. However, the Index-Catalogue had been largely overlooked by many medical librarians until 2004, when the National Library of Medicine released IndexCat, the online version of Index-Catalogue. Access to this huge amount of material raised new questions: What was the coverage of the Index-Catalogue? How did it compare and overlap with the Index Medicus? Method: Over 1,000 randomly generated Index Medicus citations were cross-referenced in IndexCat. Results: Inclusion, form, content, authority control, and subject headings were evaluated, revealing that the relationship between the two publications was neither simple nor static through time. In addition, the authors found interesting anomalies that shed light on how medical literature was selected and indexed in “America's greatest contribution to medical knowledge.” PMID:19404501
Studies of Big Data metadata segmentation between relational and non-relational databases

NASA Astrophysics Data System (ADS)

Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

2015-12-01

In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.
Metadata improvements driving new tools and services at a NASA data center

NASA Astrophysics Data System (ADS)

Moroni, D. F.; Hausman, J.; Foti, G.; Armstrong, E. M.

2011-12-01

The NASA Physical Oceanography DAAC (PO.DAAC) is responsible for distributing and maintaining satellite derived oceanographic data from a number of NASA and non-NASA missions for the physical disciplines of ocean winds, sea surface temperature, ocean topography and gravity. Currently its holdings consist of over 600 datasets with a data archive in excess of 200 Terrabytes. The PO.DAAC has recently embarked on a metadata quality and completeness project to migrate, update and improve metadata records for over 300 public datasets. An interactive database management tool has been developed to allow data scientists to enter, update and maintain metadata records. This tool communicates directly with PO.DAAC's Data Management and Archiving System (DMAS), which serves as the new archival and distribution backbone as well as a permanent repository of dataset and granule-level metadata. Although we will briefly discuss the tool, more important ramifications are the ability to now expose, propagate and leverage the metadata in a number of ways. First, the metadata are exposed directly through a faceted and free text search interface directly from drupal-based PO.DAAC web pages allowing for quick browsing and data discovery especially by "drilling" through the various facet levels that organize datasets by time/space resolution, processing level, sensor, measurement type etc. Furthermore, the metadata can now be exposed through web services to produce metadata records in a number of different formats such as FGDC and ISO 19115, or potentially propagated to visualization and subsetting tools, and other discovery interfaces. The fundamental concept is that the metadata forms the essential bridge between the user, and the tool or discovery mechanism for a broad range of ocean earth science data records.
Inter-University Upper Atmosphere Global Observation Network (IUGONET) Metadata Database and Its Interoperability

NASA Astrophysics Data System (ADS)

Yatagai, A. I.; Iyemori, T.; Ritschel, B.; Koyama, Y.; Hori, T.; Abe, S.; Tanaka, Y.; Shinbori, A.; Umemura, N.; Sato, Y.; Yagi, M.; Ueno, S.; Hashiguchi, N. O.; Kaneda, N.; Belehaki, A.; Hapgood, M. A.

2013-12-01

The IUGONET is a Japanese program to build a metadata database for ground-based observations of the upper atmosphere [1]. The project began in 2009 with five Japanese institutions which archive data observed by radars, magnetometers, photometers, radio telescopes and helioscopes, and so on, at various altitudes from the Earth's surface to the Sun. Systems have been developed to allow searching of the above described metadata. We have been updating the system and adding new and updated metadata. The IUGONET development team adopted the SPASE metadata model [2] to describe the upper atmosphere data. This model is used as the common metadata format by the virtual observatories for solar-terrestrial physics. It includes metadata referring to each data file (called a 'Granule'), which enable a search for data files as well as data sets. Further details are described in [2] and [3]. Currently, three additional Japanese institutions are being incorporated in IUGONET. Furthermore, metadata of observations of the troposphere, taken at the observatories of the middle and upper atmosphere radar at Shigaraki and the Meteor radar in Indonesia, have been incorporated. These additions will contribute to efficient interdisciplinary scientific research. In the beginning of 2013, the registration of the 'Observatory' and 'Instrument' metadata was completed, which makes it easy to overview of the metadata database. The number of registered metadata as of the end of July, totalled 8.8 million, including 793 observatories and 878 instruments. It is important to promote interoperability and/or metadata exchange between the database development groups. A memorandum of agreement has been signed with the European Near-Earth Space Data Infrastructure for e-Science (ESPAS) project, which has similar objectives to IUGONET with regard to a framework for formal collaboration. Furthermore, observations by satellites and the International Space Station are being incorporated with a view for
Interoperable Solar Data and Metadata via LISIRD 3

NASA Astrophysics Data System (ADS)

Wilson, A.; Lindholm, D. M.; Pankratz, C. K.; Snow, M. A.; Woods, T. N.

2015-12-01

LISIRD 3 is a major upgrade of the LASP Interactive Solar Irradiance Data Center (LISIRD), which serves several dozen space based solar irradiance and related data products to the public. Through interactive plots, LISIRD 3 provides data browsing supported by data subsetting and aggregation. Incorporating a semantically enabled metadata repository, LISIRD 3 users see current, vetted, consistent information about the datasets offered. Users can now also search for datasets based on metadata fields such as dataset type and/or spectral or temporal range. This semantic database enables metadata browsing, so users can discover the relationships between datasets, instruments, spacecraft, mission and PI. The database also enables creation and publication of metadata records in a variety of formats, such as SPASE or ISO, making these datasets more discoverable. The database also enables the possibility of a public SPARQL endpoint, making the metadata browsable in an automated fashion. LISIRD 3's data access middleware, LaTiS, provides dynamic, on demand reformatting of data and timestamps, subsetting and aggregation, and other server side functionality via a RESTful OPeNDAP compliant API, enabling interoperability between LASP datasets and many common tools. LISIRD 3's templated front end design, coupled with the uniform data interface offered by LaTiS, allows easy integration of new datasets. Consequently the number and variety of datasets offered by LISIRD has grown to encompass several dozen, with many more to come. This poster will discuss design and implementation of LISIRD 3, including tools used, capabilities enabled, and issues encountered.
Construction of the Second Quito Astrolabe Catalogue

NASA Astrophysics Data System (ADS)

Kolesnik, Y. B.

1994-03-01

A method for astrolabe catalogue construction is presented. It is based on classical concepts, but the model of conditional equations for the group reduction is modified, additional parameters being introduced in the step- wise regressions. The chain adjustment is neglected, and the advantages of this approach are discussed. The method has been applied to the data obtained with the astrolabe of the Quito Astronomical Observatory from 1964 to 1983. Various characteristics of the catalogue produced with this method are compared with those due to the rigorous classical method. Some improvement both in systematic and random errors is outlined.
Integrated Array/Metadata Analytics

NASA Astrophysics Data System (ADS)

Misev, Dimitar; Baumann, Peter

2015-04-01

Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
A case for user-generated sensor metadata

NASA Astrophysics Data System (ADS)

Nüst, Daniel

2015-04-01

Cheap and easy to use sensing technology and new developments in ICT towards a global network of sensors and actuators promise previously unthought of changes for our understanding of the environment. Large professional as well as amateur sensor networks exist, and they are used for specific yet diverse applications across domains such as hydrology, meteorology or early warning systems. However the impact this "abundance of sensors" had so far is somewhat disappointing. There is a gap between (community-driven) sensor networks that could provide very useful data and the users of the data. In our presentation, we argue this is due to a lack of metadata which allows determining the fitness of use of a dataset. Syntactic or semantic interoperability for sensor webs have made great progress and continue to be an active field of research, yet they often are quite complex, which is of course due to the complexity of the problem at hand. But still, we see the most generic information to determine fitness for use is a dataset's provenance, because it allows users to make up their own minds independently from existing classification schemes for data quality. In this work we will make the case how curated user-contributed metadata has the potential to improve this situation. This especially applies for scenarios in which an observed property is applicable in different domains, and for set-ups where the understanding about metadata concepts and (meta-)data quality differs between data provider and user. On the one hand a citizen does not understand the ISO provenance metadata. On the other hand a researcher might find issues in publicly accessible time series published by citizens, which the latter might not be aware of or care about. Because users will have to determine fitness for use for each application on their own anyway, we suggest an online collaboration platform for user-generated metadata based on an extremely simplified data model. In the most basic fashion
Seismic Catalogue and Seismic Network in Haiti

NASA Astrophysics Data System (ADS)

Belizaire, D.; Benito, B.; Carreño, E.; Meneses, C.; Huerfano, V.; Polanco, E.; McCormack, D.

2013-05-01

The destructive earthquake occurred on January 10, 2010 in Haiti, highlighted the lack of preparedness of the country to address seismic phenomena. At the moment of the earthquake, there was no seismic network operating in the country, and only a partial control of the past seismicity was possible, due to the absence of a national catalogue. After the 2010 earthquake, some advances began towards the installation of a national network and the elaboration of a seismic catalogue providing the necessary input for seismic Hazard Studies. This paper presents the state of the works carried out covering both aspects. First, a seismic catalogue has been built, compiling data of historical and instrumental events occurred in the Hispaniola Island and surroundings, in the frame of the SISMO-HAITI project, supported by the Technical University of Madrid (UPM) and Developed in cooperation with the Observatoire National de l'Environnement et de la Vulnérabilité of Haiti (ONEV). Data from different agencies all over the world were gathered, being relevant the role of the Dominican Republic and Puerto Rico seismological services which provides local data of their national networks. Almost 30000 events recorded in the area from 1551 till 2011 were compiled in a first catalogue, among them 7700 events with Mw ranges between 4.0 and 8.3. Since different magnitude scale were given by the different agencies (Ms, mb, MD, ML), this first catalogue was affected by important heterogeneity in the size parameter. Then it was homogenized to moment magnitude Mw using the empirical equations developed by Bonzoni et al (2011) for the eastern Caribbean. At present, this is the most exhaustive catalogue of the country, although it is difficult to assess its degree of completeness. Regarding the seismic network, 3 stations were installed just after the 2010 earthquake by the Canadian Government. The data were sent by telemetry thought the Canadian System CARINA. In 2012, the Spanish IGN together
The Metadata Cloud: The Last Piece of a Distributed Data System Model

NASA Astrophysics Data System (ADS)

King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.

2012-12-01

Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Creating Access Points to Instrument-Based Atmospheric Data: Perspectives from the ARM Metadata Manager

NASA Astrophysics Data System (ADS)

Troyan, D.

2016-12-01

The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
Finding Atmospheric Composition (AC) Metadata

NASA Technical Reports Server (NTRS)

Strub, Richard F..; Falke, Stefan; Fiakowski, Ed; Kempler, Steve; Lynnes, Chris; Goussev, Oleg

2015-01-01

The Atmospheric Composition Portal (ACP) is an aggregator and curator of information related to remotely sensed atmospheric composition data and analysis. It uses existing tools and technologies and, where needed, enhances those capabilities to provide interoperable access, tools, and contextual guidance for scientists and value-adding organizations using remotely sensed atmospheric composition data. The initial focus is on Essential Climate Variables identified by the Global Climate Observing System CH4, CO, CO2, NO2, O3, SO2 and aerosols. This poster addresses our efforts in building the ACP Data Table, an interface to help discover and understand remotely sensed data that are related to atmospheric composition science and applications. We harvested GCMD, CWIC, GEOSS metadata catalogs using machine to machine technologies - OpenSearch, Web Services. We also manually investigated the plethora of CEOS data providers portals and other catalogs where that data might be aggregated. This poster is our experience of the excellence, variety, and challenges we encountered.Conclusions:1.The significant benefits that the major catalogs provide are their machine to machine tools like OpenSearch and Web Services rather than any GUI usability improvements due to the large amount of data in their catalog.2.There is a trend at the large catalogs towards simulating small data provider portals through advanced services. 3.Populating metadata catalogs using ISO19115 is too complex for users to do in a consistent way, difficult to parse visually or with XML libraries, and too complex for Java XML binders like CASTOR.4.The ability to search for Ids first and then for data (GCMD and ECHO) is better for machine to machine operations rather than the timeouts experienced when returning the entire metadata entry at once. 5.Metadata harvest and export activities between the major catalogs has led to a significant amount of duplication. (This is currently being addressed) 6.Most (if not all
Separation of metadata and pixel data to speed DICOM tag morphing.

PubMed

Ismail, Mahmoud; Philbin, James

2013-01-01

The DICOM information model combines pixel data and metadata in single DICOM object. It is not possible to access the metadata separately from the pixel data. There are use cases where only metadata is accessed. The current DICOM object format increases the running time of those use cases. Tag morphing is one of those use cases. Tag morphing includes deletion, insertion or manipulation of one or more of the metadata attributes. It is typically used for order reconciliation on study acquisition or to localize the issuer of patient ID (IPID) and the patient ID attributes when data from one domain is transferred to a different domain. In this work, we propose using Multi-Series DICOM (MSD) objects, which separate metadata from pixel data and remove duplicate attributes, to reduce the time required for Tag Morphing. The time required to update a set of study attributes in each format is compared. The results show that the MSD format significantly reduces the time required for tag morphing.
Mercury- Distributed Metadata Management, Data Discovery and Access System

NASA Astrophysics Data System (ADS)

Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

2007-12-01

Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Version 2000 of the Catalogue of Galactic Planetary Nebulae

NASA Astrophysics Data System (ADS)

Kohoutek, L.

2001-11-01

The ``Catalogue of Galactic Planetary Nebulae (Version 2000)'' appears in Abhandlungen aus der Hamburger Sternwarte, Band XII in the year 2001. It is a continuation of CGPN(1967) and contains 1510 objects classified as galactic PNe up to the end of 1999. The lists of possible pre-PNe and possible post-PNe are also given. The catalogue is restricted only to the data belonging to the location and identification of the objects. It gives identification charts of PNe discovered since 1965 (published in the supplements to CGPN) and those charts of objects discovered earlier, which have wrong or uncertain identification. The question ``what is a planetary nebula'' is discussed and the typical values of PNe and of their central stars are summarized. Short statistics about the discoveries of PNe are given. The catalogue is also available in the Centre de Données, Strasbourg and at Hamburg Observatory via internet. The Catalogue is only available in electronic form at the CDS via anonymous ftp to cdsarc.u-strasbg.fr (130.79.128.5) or via http://cdsweb.u-strasbg.fr/cgi-bin/qcat?J/A+A/378/843

Catalogue of ptyctimous mites (Acari, Oribatida) of the world.

PubMed

NiedbaŁa, Wojciech; Liu, Dong

2018-03-11

As important representatives of Oribatida (Acari), ptyctimous mites comprise more than 1400 described species in 40 genera and subgenera, with nearly cosmopolitan distribution except for the Arctic and Antarctic Regions. They are capable of folding the aspidosoma under the opisthosoma to protect their appendages, and are primarily soil and litter inhabitants, feeding on fungi and decaying plant remains with various levels of specificity. Our purpose was to provide a detailed catalogue of all known ptyctimous mite species in the world with information of distribution, taxonomic issues and some remarks. Data of known juvenile instars of ptyctimous mites which were not included in Norton Ermilov (2014) were added. We hope that our catalogue with bibliography will be helpful in taxonomic and ecological studies. The catalogue presents taxonomic information and geographic distribution of 1431 known species of the world belonging to 42 genera and eight families (not including data of genus and species inquirenda, nomina nuda and species without author name). Among them, 261 species are listed as synonyms, 43 species inquirenda, nine homonyms, 17 new synonyms, one new subgenus Mahuntritia subgenus nov. and three new names are included in the catalogue.
Modern Special Collections Cataloguing: A University of London Case Study

ERIC Educational Resources Information Center

Attar, K. E.

2013-01-01

Recent years have seen a growing emphasis on modern special collections (in themselves no new phenomenon), with a dichotomy between guidance for detailed cataloguing in "Descriptive Cataloging of Rare Materials (Books)" (DCRM(B), 2007) and the value of clearing cataloguing backlogs expeditiously. This article describes the De la Mare…
Cytometry metadata in XML

NASA Astrophysics Data System (ADS)

Leif, Robert C.; Leif, Stephanie H.

2016-04-01

Introduction: The International Society for Advancement of Cytometry (ISAC) has created a standard for the Minimum Information about a Flow Cytometry Experiment (MIFlowCyt 1.0). CytometryML will serve as a common metadata standard for flow and image cytometry (digital microscopy). Methods: The MIFlowCyt data-types were created, as is the rest of CytometryML, in the XML Schema Definition Language (XSD1.1). The datatypes are primarily based on the Flow Cytometry and the Digital Imaging and Communication (DICOM) standards. A small section of the code was formatted with standard HTML formatting elements (p, h1, h2, etc.). Results:1) The part of MIFlowCyt that describes the Experimental Overview including the specimen and substantial parts of several other major elements has been implemented as CytometryML XML schemas (www.cytometryml.org). 2) The feasibility of using MIFlowCyt to provide the combination of an overview, table of contents, and/or an index of a scientific paper or a report has been demonstrated. Previously, a sample electronic publication, EPUB, was created that could contain both MIFlowCyt metadata as well as the binary data. Conclusions: The use of CytometryML technology together with XHTML5 and CSS permits the metadata to be directly formatted and together with the binary data to be stored in an EPUB container. This will facilitate: formatting, data- mining, presentation, data verification, and inclusion in structured research, clinical, and regulatory documents, as well as demonstrate a publication's adherence to the MIFlowCyt standard, promote interoperability and should also result in the textual and numeric data being published using web technology without any change in composition.
The PMA Catalogue: 420 million positions and absolute proper motions

NASA Astrophysics Data System (ADS)

Akhmetov, V. S.; Fedorov, P. N.; Velichko, A. B.; Shulga, V. M.

2017-07-01

We present a catalogue that contains about 420 million absolute proper motions of stars. It was derived from the combination of positions from Gaia DR1 and 2MASS, with a mean difference of epochs of about 15 yr. Most of the systematic zonal errors inherent in the 2MASS Catalogue were eliminated before deriving the absolute proper motions. The absolute calibration procedure (zero-pointing of the proper motions) was carried out using about 1.6 million positions of extragalactic sources. The mean formal error of the absolute calibration is less than 0.35 mas yr-1. The derived proper motions cover the whole celestial sphere without gaps for a range of stellar magnitudes from 8 to 21 mag. In the sky areas where the extragalactic sources are invisible (the avoidance zone), a dedicated procedure was used that transforms the relative proper motions into absolute ones. The rms error of proper motions depends on stellar magnitude and ranges from 2-5 mas yr-1 for stars with 10 mag < G < 17 mag to 5-10 mas yr-1 for faint ones. The present catalogue contains the Gaia DR1 positions of stars for the J2015 epoch. The system of the PMA proper motions does not depend on the systematic errors of the 2MASS positions, and in the range from 14 to 21 mag represents an independent realization of a quasi-inertial reference frame in the optical and near-infrared wavelength range. The Catalogue also contains stellar magnitudes taken from the Gaia DR1 and 2MASS catalogues. A comparison of the PMA proper motions of stars with similar data from certain recent catalogues has been undertaken.
Documentation for the machine-readable character coded version of the SKYMAP catalogue

NASA Technical Reports Server (NTRS)

Warren, W. H., Jr.

1981-01-01

The SKYMAP catalogue is a compilation of astronomical data prepared primarily for purposes of attitude guidance for satellites. In addition to the SKYMAP Master Catalogue data base, a software package of data base management and utility programs is available. The tape version of the SKYMAP Catalogue, as received by the Astronomical Data Center (ADC), contains logical records consisting of a combination of binary and EBCDIC data. Certain character coded data in each record are redundant in that the same data are present in binary form. In order to facilitate wider use of all SKYMAP data by the astronomical community, a formatted (character) version was prepared by eliminating all redundant character data and converting all binary data to character form. The character version of the catalogue is described. The document is intended to fully describe the formatted tape so that users can process the data problems and guess work; it should be distributed with any character version of the catalogue.
VizieR Online Data Catalog: PMA Catalogue (Akhmetov+, 2017)

NASA Astrophysics Data System (ADS)

Akhmetov, V. S.; Fedorov, P. N.; Velichko, A. B.; Shulga, V. M.

2017-06-01

The idea for creating the catalogue is very simple. The PMA catalogue has been derived from a combination of two catalogues, namely 2MASS and Gaia DR1. The difference of epochs of observations for these catalogues is approximately 15 yr. The positions of objects in the Gaia DR1 catalogue are referred to the reference frame, which is consistent with ICRF to better than 0.1 mas for the J2015.0 epoch. The positions of objects in 2MASS are referred to HCRF, which, as was shown in Kovalevsky et al. (1997A&A...323..620K), is aligned with the ICRF to within Â±0.6 mas at the epoch 1991.25 and is non-rotating with respect to distant extragalactic objects to within Â±0.25mas/yr. By comparing the positions of the common objects contained in the catalogues, it is possible to determine their proper motions within their common range of stellar magnitudes by dividing differences of positions over the time interval between their observations. Formally, proper motions derived in such a way are given in the ICRF system, because the positions of both Gaia DR1 stars and those of 2MASS objects (through Hipparcos/Tycho-2 stars) are given in the ICRF and cover the whole sphere without gaps. We designate them further in this paper as relative, with the aim of discriminating them from absolute ones, which refer to the reference frame defined by the positions of about 1.6 million galaxies from Gaia DR1. There is no possibility of obtaining estimates of individual errors of proper motions of stars for the PMA Catalogue from the intrinsic convergence, because the direct errors for positions are not indicated in 2MASS. Therefore we use some indirect methods to obtain the estimates of uncertainties for proper motions. After elimination of the systematic errors, the root-mean-squared deviation of the coordinate differences of extended sources is about 200mas, and the mean number of galaxies inside each pixel is about 1300, so we expect the error of the absolute calibration to be 0.35mas
NIED seismic moment tensor catalogue for regional earthquakes around Japan: quality test and application

NASA Astrophysics Data System (ADS)

Kubo, Atsuki; Fukuyama, Eiichi; Kawai, Hiroyuki; Nonomura, Ken'ichi

2002-10-01

We have examined the quality of the National Research Institute for Earth Science and Disaster Prevention (NIED) seismic moment tensor (MT) catalogue obtained using a regional broadband seismic network (FREESIA). First, we examined using synthetic waveforms the robustness of the solutions with regard to data noise as well as to errors in the velocity structure and focal location. Then, to estimate the reliability, robustness and validity of the catalogue, we compared it with the Harvard centroid moment tensor (CMT) catalogue as well as the Japan Meteorological Agency (JMA) focal mechanism catalogue. We found out that the NIED catalogue is consistent with Harvard and JMA catalogues within the uncertainty of 0.1 in moment magnitude, 10 km in depth, and 15° in direction of the stress axes. The NIED MT catalogue succeeded in reducing to 3.5 the lower limit of moment magnitude above which the moment tensor could be reliably estimated. Finally, we estimated the stress tensors in several different regions by using the NIED MT catalogue. This enables us to elucidate the stress/deformation field in and around the Japanese islands to understand the mode of deformation and applied stress. Moreover, we identified a region of abnormal stress in a swarm area from stress tensor estimates.
Scalable PGAS Metadata Management on Extreme Scale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chavarría-Miranda, Daniel; Agarwal, Khushbu; Straatsma, TP

Programming models intended to run on exascale systems have a number of challenges to overcome, specially the sheer size of the system as measured by the number of concurrent software entities created and managed by the underlying runtime. It is clear from the size of these systems that any state maintained by the programming model has to be strictly sub-linear in size, in order not to overwhelm memory usage with pure overhead. A principal feature of Partitioned Global Address Space (PGAS) models is providing easy access to global-view distributed data structures. In order to provide efficient access to these distributedmore » data structures, PGAS models must keep track of metadata such as where array sections are located with respect to processes/threads running on the HPC system. As PGAS models and applications become ubiquitous on very large transpetascale systems, a key component to their performance and scalability will be efficient and judicious use of memory for model overhead (metadata) compared to application data. We present an evaluation of several strategies to manage PGAS metadata that exhibit different space/time tradeoffs. We use two real-world PGAS applications to capture metadata usage patterns and gain insight into their communication behavior.« less
Metadata management for high content screening in OMERO.

PubMed

Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R

2016-03-01

High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
OSCAR/Surface: Metadata for the WMO Integrated Observing System WIGOS

NASA Astrophysics Data System (ADS)

Klausen, Jörg; Pröscholdt, Timo; Mannes, Jürg; Cappelletti, Lucia; Grüter, Estelle; Calpini, Bertrand; Zhang, Wenjian

2016-04-01

The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority underpinning all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). It does this by better integrating WMO and co-sponsored observing systems, as well as partner networks. For this, an important aspect is the description of the observational capabilities by way of structured metadata. The 17th Congress of the Word Meteorological Organization (Cg-17) has endorsed the semantic WIGOS metadata standard (WMDS) developed by the Task Team on WIGOS Metadata (TT-WMD). The standard comprises of a set of metadata classes that are considered to be of critical importance for the interpretation of observations and the evolution of observing systems relevant to WIGOS. The WMDS serves all recognized WMO Application Areas, and its use for all internationally exchanged observational data generated by WMO Members is mandatory. The standard will be introduced in three phases between 2016 and 2020. The Observing Systems Capability Analysis and Review (OSCAR) platform operated by MeteoSwiss on behalf of WMO is the official repository of WIGOS metadata and an implementation of the WMDS. OSCAR/Surface deals with all surface-based observations from land, air and oceans, combining metadata managed by a number of complementary, more domain-specific systems (e.g., GAWSIS for the Global Atmosphere Watch, JCOMMOPS for the marine domain, the WMO Radar database). It is a modern, web-based client-server application with extended information search, filtering and mapping capabilities including a fully developed management console to add and edit observational metadata. In addition, a powerful application programming interface (API) is being developed to allow machine-to-machine metadata exchange. The API is based on an ISO/OGC-compliant XML schema for the WMDS using the Observations and Measurements (ISO19156) conceptual model. The purpose of the
The SPM Kinematic Catalogue of Planetary Nebulae

NASA Astrophysics Data System (ADS)

López, J. A.; Richer, M. G.; Riesgo, H.; Steffen, W.; García-Segura, G.; Meaburn, J.; Bryce, M.

The San Pedro Mártir Kinematic Catalogue of Planetary Nebulae aims at providing detailed kinematic information for galactic planetary nebulae (PNe) and bright PNe in the Local Group. The database provides long-slit, Echelle spectra and images where the location of the slits on the nebula are indicated. As a tool to help interpret the 2D line profiles or position-velocity data, an atlas of synthetic emission line spectra accompanies the Catalogue. The atlas has been produced with the code SHAPE and contains synthetic spectra for all the main morphological groups for a wide range of spatial orientations and slit locations over the nebula.
Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format.

PubMed

Ismail, Mahmoud; Philbin, James

2015-04-01

The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies' metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata.
Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format

PubMed Central

Ismail, Mahmoud; Philbin, James

2015-01-01

Abstract. The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies’ metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata. PMID:26158117
ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data

NASA Astrophysics Data System (ADS)

Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.

2014-12-01

Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.
The Use of Metadata Visualisation Assist Information Retrieval

DTIC Science & Technology

2007-10-01

album title, the track length and the genre of music . Again, any of these pieces of information can be used to quickly search and locate specific...that person. Music files also have metadata tags, in a format called ID3. This usually contains information such as the artist, the song title, the...tracks, to provide more information about the entire music collection, or to find similar or diverse tracks within the collection. Metadata is
VizieR Online Data Catalog: Catalogue of variable stars in open clusters (Zejda+, 2012)

NASA Astrophysics Data System (ADS)

Zejda, M.; Paunzen, E.; Baumann, B.; Mikulasek, Z.; Liska, J.

2012-08-01

The catalogue of variable stars in open clusters were prepared by cross-matching of Variable Stars Index (http://www.aavso.org/vsx) version Apr 29, 2012 (available online, Cat. B/vsx) against the version 3.1. catalogue of open clusters DAML02 (Dias et al. 2002A&A...389..871D, Cat. B/ocl) available on the website http://www.astro.iag.usp.br/~wilton. The open clusters were divided into two categories according to their size, where the limiting diameter was 60 arcmin. The list of all suspected variables and variable stars located within the fields of open clusters up to two times of given cluster radius were generated (Table 1). 8938 and 9127 variable stars are given in 461 "smaller" and 74 "larger" clusters, respectively. All found variable stars were matched against the PPMXL catalog of positions and proper motions within the ICRS (Roeser et al., 2010AJ....139.2440R, Cat. I/317). Proper motion data were included in our catalogue. Unfortunately, a homogeneous data set of mean cluster proper motions has not been available until now. Therefore we used the following sources (sorted alphabetically) to compile a new catalogue: Baumgardt et al. (2000, Cat. J/A+AS/146/251): based on the Hipparcos catalogue Beshenov & Loktin (2004A&AT...23..103B): based on the Tycho-2 catalogue Dias et al. (2001, Cat. J/A+A/376/441, 2002A&A...389..871D, Cat. B/ocl): based on the Tycho-2 catalogue Dias et al. (2006, Cat. J/A+A/446/949): based on the UCAC2 catalog (Zacharias et al., 2004AJ....127.3043Z, Cat. I/289) Frinchaboy & Majewski (2008, Cat. J/AJ/136/118): based on the Tycho-2 catalogue Kharchenko et al. (2005, J/A+A/438/1163): based on the ASCC2.5 catalogue (Kharchenko, 2001KFNT...17..409K, Cat. I/280) Krone-Martins et al. (2010, Cat. J/A+A/516/A3): based on the Bordeaux PM2000 proper motion catalogue (Ducourant et al., 2006A&A...448.1235D, Cat. I/300) Robichon et al. (1999, Cat. J/A+A/345/471): based on the Hipparcos catalogue van Leeuwen (2009A&A...497..209V): based on the new
Establishing semantic interoperability of biomedical metadata registries using extended semantic relationships.

PubMed

Park, Yu Rang; Yoon, Young Jo; Kim, Hye Hyeon; Kim, Ju Han

2013-01-01

Achieving semantic interoperability is critical for biomedical data sharing between individuals, organizations and systems. The ISO/IEC 11179 MetaData Registry (MDR) standard has been recognized as one of the solutions for this purpose. The standard model, however, is limited. Representing concepts consist of two or more values, for instance, are not allowed including blood pressure with systolic and diastolic values. We addressed the structural limitations of ISO/IEC 11179 by an integrated metadata object model in our previous research. In the present study, we introduce semantic extensions for the model by defining three new types of semantic relationships; dependency, composite and variable relationships. To evaluate our extensions in a real world setting, we measured the efficiency of metadata reduction by means of mapping to existing others. We extracted metadata from the College of American Pathologist Cancer Protocols and then evaluated our extensions. With no semantic loss, one third of the extracted metadata could be successfully eliminated, suggesting better strategy for implementing clinical MDRs with improved efficiency and utility.
Design and implementation of a fault-tolerant and dynamic metadata database for clinical trials

NASA Astrophysics Data System (ADS)

Lee, J.; Zhou, Z.; Talini, E.; Documet, J.; Liu, B.

2007-03-01

In recent imaging-based clinical trials, quantitative image analysis (QIA) and computer-aided diagnosis (CAD) methods are increasing in productivity due to higher resolution imaging capabilities. A radiology core doing clinical trials have been analyzing more treatment methods and there is a growing quantity of metadata that need to be stored and managed. These radiology centers are also collaborating with many off-site imaging field sites and need a way to communicate metadata between one another in a secure infrastructure. Our solution is to implement a data storage grid with a fault-tolerant and dynamic metadata database design to unify metadata from different clinical trial experiments and field sites. Although metadata from images follow the DICOM standard, clinical trials also produce metadata specific to regions-of-interest and quantitative image analysis. We have implemented a data access and integration (DAI) server layer where multiple field sites can access multiple metadata databases in the data grid through a single web-based grid service. The centralization of metadata database management simplifies the task of adding new databases into the grid and also decreases the risk of configuration errors seen in peer-to-peer grids. In this paper, we address the design and implementation of a data grid metadata storage that has fault-tolerance and dynamic integration for imaging-based clinical trials.
Recipes for Semantic Web Dog Food — The ESWC and ISWC Metadata Projects

NASA Astrophysics Data System (ADS)

Möller, Knud; Heath, Tom; Handschuh, Siegfried; Domingue, John

Semantic Web conferences such as ESWC and ISWC offer prime opportunities to test and showcase semantic technologies. Conference metadata about people, papers and talks is diverse in nature and neither too small to be uninteresting or too big to be unmanageable. Many metadata-related challenges that may arise in the Semantic Web at large are also present here. Metadata must be generated from sources which are often unstructured and hard to process, and may originate from many different players, therefore suitable workflows must be established. Moreover, the generated metadata must use appropriate formats and vocabularies, and be served in a way that is consistent with the principles of linked data. This paper reports on the metadata efforts from ESWC and ISWC, identifies specific issues and barriers encountered during the projects, and discusses how these were approached. Recommendations are made as to how these may be addressed in the future, and we discuss how these solutions may generalize to metadata production for the Semantic Web at large.
Metadata to Describe Genomic Information.

PubMed

Delgado, Jaime; Naro, Daniel; Llorente, Silvia; Gelpí, Josep Lluís; Royo, Romina

2018-01-01

Interoperable metadata is key for the management of genomic information. We propose a flexible approach that we contribute to the standardization by ISO/IEC of a new format for efficient and secure compressed storage and transmission of genomic information.

Restful Implementation of Catalogue Service for Geospatial Data Provenance

NASA Astrophysics Data System (ADS)

Jiang, L. C.; Yue, P.; Lu, X. C.

2013-10-01

Provenance, also known as lineage, is important in understanding the derivation history of data products. Geospatial data provenance helps data consumers to evaluate the quality and reliability of geospatial data. In a service-oriented environment, where data are often consumed or produced by distributed services, provenance could be managed by following the same service-oriented paradigm. The Open Geospatial Consortium (OGC) Catalogue Service for the Web (CSW) is used for the registration and query of geospatial data provenance by extending ebXML Registry Information Model (ebRIM). Recent advance of the REpresentational State Transfer (REST) paradigm has shown great promise for the easy integration of distributed resources. RESTful Web Service aims to provide a standard way for Web clients to communicate with servers based on REST principles. The existing approach for provenance catalogue service could be improved by adopting the RESTful design. This paper presents the design and implementation of a catalogue service for geospatial data provenance following RESTful architecture style. A middleware named REST Converter is added on the top of the legacy catalogue service to support a RESTful style interface. The REST Converter is composed of a resource request dispatcher and six resource handlers. A prototype service is developed to demonstrate the applicability of the approach.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras.

PubMed

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-06-24

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras

PubMed Central

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-01-01

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system. PMID:27347961
Sensor metadata blueprints and computer-aided editing for disciplined SensorML

NASA Astrophysics Data System (ADS)

Tagliolato, Paolo; Oggioni, Alessandro; Fugazza, Cristiano; Pepe, Monica; Carrara, Paola

2016-04-01

The need for continuous, accurate, and comprehensive environmental knowledge has led to an increase in sensor observation systems and networks. The Sensor Web Enablement (SWE) initiative has been promoted by the Open Geospatial Consortium (OGC) to foster interoperability among sensor systems. The provision of metadata according to the prescribed SensorML schema is a key component for achieving this and nevertheless availability of correct and exhaustive metadata cannot be taken for granted. On the one hand, it is awkward for users to provide sensor metadata because of the lack in user-oriented, dedicated tools. On the other, the specification of invariant information for a given sensor category or model (e.g., observed properties and units of measurement, manufacturer information, etc.), can be labor- and timeconsuming. Moreover, the provision of these details is error prone and subjective, i.e., may differ greatly across distinct descriptions for the same system. We provide a user-friendly, template-driven metadata authoring tool composed of a backend web service and an HTML5/javascript client. This results in a form-based user interface that conceals the high complexity of the underlying format. This tool also allows for plugging in external data sources providing authoritative definitions for the aforementioned invariant information. Leveraging these functionalities, we compiled a set of SensorML profiles, that is, sensor metadata blueprints allowing end users to focus only on the metadata items that are related to their specific deployment. The natural extension of this scenario is the involvement of end users and sensor manufacturers in the crowd-sourced evolution of this collection of prototypes. We describe the components and workflow of our framework for computer-aided management of sensor metadata.
Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process

ERIC Educational Resources Information Center

Currier, Sarah; Barton, Jane; O'Beirne, Ronan; Ryan, Ben

2004-01-01

Metadata enables users to find the resources they require, therefore it is an important component of any digital learning object repository. Much work has already been done within the learning technology community to assure metadata quality, focused on the development of metadata standards, specifications and vocabularies and their implementation…
Impact of magnitude uncertainties on seismic catalogue properties

NASA Astrophysics Data System (ADS)

Leptokaropoulos, K. M.; Adamaki, A. K.; Roberts, R. G.; Gkarlaouni, C. G.; Paradisopoulou, P. M.

2018-05-01

Catalogue-based studies are of central importance in seismological research, to investigate the temporal, spatial and size distribution of earthquakes in specified study areas. Methods for estimating the fundamental catalogue parameters like the Gutenberg-Richter (G-R) b-value and the completeness magnitude (Mc) are well established and routinely applied. However, the magnitudes reported in seismicity catalogues contain measurement uncertainties which may significantly distort the estimation of the derived parameters. In this study, we use numerical simulations of synthetic data sets to assess the reliability of different methods for determining b-value and Mc, assuming the G-R law validity. After contaminating the synthetic catalogues with Gaussian noise (with selected standard deviations), the analysis is performed for numerous data sets of different sample size (N). The noise introduced to the data generally leads to a systematic overestimation of magnitudes close to and above Mc. This fact causes an increase of the average number of events above Mc, which in turn leads to an apparent decrease of the b-value. This may result to a significant overestimation of seismicity rate even well above the actual completeness level. The b-value can in general be reliably estimated even for relatively small data sets (N < 1000) when only magnitudes higher than the actual completeness level are used. Nevertheless, a correction of the total number of events belonging in each magnitude class (i.e. 0.1 unit) should be considered, to deal with the magnitude uncertainty effect. Because magnitude uncertainties (here with the form of Gaussian noise) are inevitable in all instrumental catalogues, this finding is fundamental for seismicity rate and seismic hazard assessment analyses. Also important is that for some data analyses significant bias cannot necessarily be avoided by choosing a high Mc value for analysis. In such cases, there may be a risk of severe miscalculation of
Shared Geospatial Metadata Repository for Ontario University Libraries: Collaborative Approaches

ERIC Educational Resources Information Center

Forward, Erin; Leahey, Amber; Trimble, Leanne

2015-01-01

Successfully providing access to special collections of digital geospatial data in academic libraries relies upon complete and accurate metadata. Creating and maintaining metadata using specialized standards is a formidable challenge for libraries. The Ontario Council of University Libraries' Scholars GeoPortal project, which created a shared…
Compilation of the data-base of the star catalogue by ADABAS.

NASA Astrophysics Data System (ADS)

Ishikawa, T.

A data-base of the FK4 Star Catalogue is compiled by using HITAC M-280H in the Computer Center of Tokyo University and a commercial data-base management system (DBMS) ADABAS. The purpose of this attempt is to examine whether the ADABAS, which could be regarded as a representative of the currently available DBMS's developed majorly for business and information retrieval purposes, proves itself useful for handling mass numerical data like the star catalogue data. It is concluded that the data-base could really be a convenient way for storing and utilizing the star catalogue data.
[Physiology in the mirror of systematic catalogue of Russian Academy of Sciences Library].

PubMed

Orlov, I V; Lazurkina, V B

2011-07-01

Representation of general human and animal physiology publications in the systematic catalogue of the Library of the Russian Academy of Sciences is considered. The organization of the catalogue as applied to the problems of physiology, built on the basis of library-bibliographic classification used in the Russian universal scientific libraries is described. The card files of the systematic catalogue of the Library contain about 8 million cards. Topics that reflect the problems of general physiology contain 39 headings. For the full range of sciences including physiology the tables of general types of divisions were developed. They have been marked by indexes using lower-case letters of the Russian alphabet. For further detalizations of these indexes decimal symbols are used. The indexes are attached directly to the field of knowledge index. With the current relatively easy availability of network resources value and relevance of any catalogue are reduced. However it concerns much more journal articles, rather than reference books, proceedings of various conferences, bibliographies, personalities, and especially the monographs contained in the systematic catalogue. The card systematic catalogue of the Library remains an important source of information on general physiology issues, as well as its magistral narrower sections.
A statistical metadata model for clinical trials' data management.

PubMed

Vardaki, Maria; Papageorgiou, Haralambos; Pentaris, Fragkiskos

2009-08-01

We introduce a statistical, process-oriented metadata model to describe the process of medical research data collection, management, results analysis and dissemination. Our approach explicitly provides a structure for pieces of information used in Clinical Study Data Management Systems, enabling a more active role for any associated metadata. Using the object-oriented paradigm, we describe the classes of our model that participate during the design of a clinical trial and the subsequent collection and management of the relevant data. The advantage of our approach is that we focus on presenting the structural inter-relation of these classes when used during datasets manipulation by proposing certain transformations that model the simultaneous processing of both data and metadata. Our solution reduces the possibility of human errors and allows for the tracking of all changes made during datasets lifecycle. The explicit modeling of processing steps improves data quality and assists in the problem of handling data collected in different clinical trials. The case study illustrates the applicability of the proposed framework demonstrating conceptually the simultaneous handling of datasets collected during two randomized clinical studies. Finally, we provide the main considerations for implementing the proposed framework into a modern Metadata-enabled Information System.
Developing a Metadata Infrastructure to facilitate data driven science gateway and to provide Inspire/GEMINI compliance for CLIPC

NASA Astrophysics Data System (ADS)

Mihajlovski, Andrej; Plieger, Maarten; Som de Cerff, Wim; Page, Christian

2016-04-01

indicators Key is the availability of standardized metadata, describing indicator data and services. This will enable standardization and interoperability between the different distributed services of CLIPC. To disseminate CLIPC indicator data, transformed data products to enable impacts assessments and climate change impact indicators a standardized meta-data infrastructure is provided. The challenge is that compliance of existing metadata to INSPIRE ISO standards and GEMINI standards needs to be extended to further allow the web portal to be generated from the available metadata blueprint. The information provided in the headers of netCDF files available through multiple catalogues, allow us to generate ISO compliant meta data which is in turn used to generate web based interface content, as well as OGC compliant web services such as WCS and WMS for front end and WPS interactions for the scientific users to combine and generate new datasets. The goal of the metadata infrastructure is to provide a blueprint for creating a data driven science portal, generated from the underlying: GIS data, web services and processing infrastructure. In the presentation we will present the results and lessons learned.
Reproducibility and replicability of rodent phenotyping in preclinical studies.

PubMed

Kafkafi, Neri; Agassi, Joseph; Chesler, Elissa J; Crabbe, John C; Crusio, Wim E; Eilam, David; Gerlai, Robert; Golani, Ilan; Gomez-Marin, Alex; Heller, Ruth; Iraqi, Fuad; Jaljuli, Iman; Karp, Natasha A; Morgan, Hugh; Nicholson, George; Pfaff, Donald W; Richter, S Helene; Stark, Philip B; Stiedl, Oliver; Stodden, Victoria; Tarantino, Lisa M; Tucci, Valter; Valdar, William; Williams, Robert W; Würbel, Hanno; Benjamini, Yoav

2018-04-01

The scientific community is increasingly concerned with the proportion of published "discoveries" that are not replicated in subsequent studies. The field of rodent behavioral phenotyping was one of the first to raise this concern, and to relate it to other methodological issues: the complex interaction between genotype and environment; the definitions of behavioral constructs; and the use of laboratory mice and rats as model species for investigating human health and disease mechanisms. In January 2015, researchers from various disciplines gathered at Tel Aviv University to discuss these issues. The general consensus was that the issue is prevalent and of concern, and should be addressed at the statistical, methodological and policy levels, but is not so severe as to call into question the validity and the usefulness of model organisms as a whole. Well-organized community efforts, coupled with improved data and metadata sharing, have a key role in identifying specific problems and promoting effective solutions. Replicability is closely related to validity, may affect generalizability and translation of findings, and has important ethical implications. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
To Teach or Not to Teach: The Ethics of Metadata

ERIC Educational Resources Information Center

Barnes, Cynthia; Cavaliere, Frank

2009-01-01

Metadata is information about computer-generated documents that is often inadvertently transmitted to others. The problems associated with metadata have become more acute over time as word processing and other popular programs have become more receptive to the concept of collaboration. As more people become involved in the preparation of…
Using URIs to effectively transmit sensor data and metadata

NASA Astrophysics Data System (ADS)

Kokkinaki, Alexandra; Buck, Justin; Darroch, Louise; Gardner, Thomas

2017-04-01

Autonomous ocean observation is massively increasing the number of sensors in the ocean. Accordingly, the continuing increase in datasets produced, makes selecting sensors that are fit for purpose a growing challenge. Decision making on selecting quality sensor data, is based on the sensor's metadata, i.e. manufacturer specifications, history of calibrations etc. The Open Geospatial Consortium (OGC) has developed the Sensor Web Enablement (SWE) standards to facilitate integration and interoperability of sensor data and metadata. The World Wide Web Consortium (W3C) Semantic Web technologies enable machine comprehensibility promoting sophisticated linking and processing of data published on the web. Linking the sensor's data and metadata according to the above-mentioned standards can yield practical difficulties, because of internal hardware bandwidth restrictions and a requirement to constrain data transmission costs. Our approach addresses these practical difficulties by uniquely identifying sensor and platform models and instances through URIs, which resolve via content negotiation to either OGC's sensor meta language, sensorML or W3C's Linked Data. Data transmitted by a sensor incorporate the sensor's unique URI to refer to its metadata. Sensor and platform model URIs and descriptions are created and hosted by the British Oceanographic Data Centre (BODC) linked systems service. The sensor owner creates the sensor and platform instance URIs prior and during sensor deployment, through an updatable web form, the Sensor Instance Form (SIF). SIF enables model and instance URI association but also platform and sensor linking. The use of URIs, which are dynamically generated through the SIF, offers both practical and economical benefits to the implementation of SWE and Linked Data standards in near real time systems. Data can be linked to metadata dynamically in-situ while saving on the costs associated to the transmission of long metadata descriptions. The transmission
A new catalogue of ultraluminous X-ray sources (and more!)

NASA Astrophysics Data System (ADS)

Roberts, T.; Earnshaw, H.; Walton, D.; Middleton, M.; Mateos, S.

2017-10-01

Many of the critical issues of ultraluminous X-ray source (ULX) science - for example the prevalence of IMBH and/or ULX pulsar candidates within the wider ULX population - can only be addressed by studying statistical samples of ULXs. Similarly, characterising the range of properties displayed by ULXs, and so understanding their accretion physics, requires large samples of objects. To this end, we introduce a new catalogue of 376 ultraluminous X-ray sources and 1092 less luminous point X-ray sources associated with nearby galaxies, derived from the 3XMM-DR4 catalogue. We highlight applications of this catalogue, for example the identification of new IMBH candidates from the most luminous ULXs; and examining the physics of objects at the Eddington threshold, where their luminosities of ˜ 10^{39} erg s^{-1} indicate their accretion rates are ˜ Eddington. We also show how the catalogue can be used to start to examine a wider range of lower luminosity (sub-ULX) point sources in star forming galaxies than previously accessible through spectral stacking, and argue why this is important for galaxy formation in the high redshift Universe.
Inferring Metadata for a Semantic Web Peer-to-Peer Environment

ERIC Educational Resources Information Center

Brase, Jan; Painter, Mark

2004-01-01

Learning Objects Metadata (LOM) aims at describing educational resources in order to allow better reusability and retrieval. In this article we show how additional inference rules allows us to derive additional metadata from existing ones. Additionally, using these rules as integrity constraints helps us to define the constraints on LOM elements,…
Ontology-Based Search of Genomic Metadata.

PubMed

Fernandez, Javier D; Lenzerini, Maurizio; Masseroli, Marco; Venco, Francesco; Ceri, Stefano

2016-01-01

The Encyclopedia of DNA Elements (ENCODE) is a huge and still expanding public repository of more than 4,000 experiments and 25,000 data files, assembled by a large international consortium since 2007; unknown biological knowledge can be extracted from these huge and largely unexplored data, leading to data-driven genomic, transcriptomic, and epigenomic discoveries. Yet, search of relevant datasets for knowledge discovery is limitedly supported: metadata describing ENCODE datasets are quite simple and incomplete, and not described by a coherent underlying ontology. Here, we show how to overcome this limitation, by adopting an ENCODE metadata searching approach which uses high-quality ontological knowledge and state-of-the-art indexing technologies. Specifically, we developed S.O.S. GeM (http://www.bioinformatics.deib.polimi.it/SOSGeM/), a system supporting effective semantic search and retrieval of ENCODE datasets. First, we constructed a Semantic Knowledge Base by starting with concepts extracted from ENCODE metadata, matched to and expanded on biomedical ontologies integrated in the well-established Unified Medical Language System. We prove that this inference method is sound and complete. Then, we leveraged the Semantic Knowledge Base to semantically search ENCODE data from arbitrary biologists' queries. This allows correctly finding more datasets than those extracted by a purely syntactic search, as supported by the other available systems. We empirically show the relevance of found datasets to the biologists' queries.
Mercury: Reusable software application for Metadata Management, Data Discovery and Access

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce E.

2009-12-01

Mercury is a federated metadata harvesting, data discovery and access tool based on both open source packages and custom developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury is itself a reusable toolset for metadata, with current use in 12 different projects. Mercury also supports the reuse of metadata by enabling searching across a range of metadata specification and standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115. Mercury provides a single portal to information contained in distributed data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. One of the major goals of the recent redesign of Mercury was to improve the software reusability across the projects which currently fund the continuing development of Mercury. These projects span a range of land, atmosphere, and ocean ecological communities and have a number of common needs for metadata searches, but they also have a number of needs specific to one or a few projects To balance these common and project-specific needs, Mercury’s architecture includes three major reusable components; a harvester engine, an indexing system and a user interface component. The harvester engine is responsible for harvesting metadata records from various distributed servers around the USA and around the world. The harvester software was packaged in such a way that all the Mercury projects will use the same harvester scripts but each project will be driven by a set of configuration files. The harvested files are then passed to the Indexing system, where each of the fields in these structured metadata records are indexed properly, so that the query engine can perform
Serving Fisheries and Ocean Metadata to Communities Around the World

NASA Technical Reports Server (NTRS)

Meaux, Melanie

2006-01-01

NASA's Global Change Master Directory (GCMD) assists the oceanographic community in the discovery, access, and sharing of scientific data by serving on-line fisheries and ocean metadata to users around the globe. As of January 2006, the directory holds more than 16,300 Earth Science data descriptions and over 1,300 services descriptions. Of these, nearly 4,000 unique ocean-related metadata records are available to the public, with many having direct links to the data. In 2005, the GCMD averaged over 5 million hits a month, with nearly a half million unique hosts for the year. Through the GCMD portal (http://qcrnd.nasa.qov/), users can search vast and growing quantities of data and services using controlled keywords, free-text searches or a combination of both. Users may now refine a search based on topic, location, instrument, platform, project, data center, spatial and temporal coverage. The directory also offers data holders a means to post and search their data through customized portals, i.e. online customized subset metadata directories. The discovery metadata standard used is the Directory Interchange Format (DIF), adopted in 1994. This format has evolved to accommodate other national and international standards such as FGDC and IS019115. Users can submit metadata through easy-to-use online and offline authoring tools. The directory, which also serves as a coordinating node of the International Directory Network (IDN), has been active at the international, regional and national level for many years through its involvement with the Committee on Earth Observation Satellites (CEOS), federal agencies (such as NASA, NOAA, and USGS), international agencies (such as IOC/IODE, UN, and JAXA) and partnerships (such as ESIP, IOOS/DMAC, GOSIC, GLOBEC, OBIS, and GoMODP), sharing experience, knowledge related to metadata and/or data management and interoperability.
Roma-BZCAT: a multifrequency catalogue of blazars

NASA Astrophysics Data System (ADS)

Massaro, E.; Giommi, P.; Leto, C.; Marchegiani, P.; Maselli, A.; Perri, M.; Piranomonte, S.; Sclavi, S.

2009-02-01

We present a new catalogue of blazars based on multifrequency surveys and on an extensive review of the literature. Blazars are classified as BL Lacertae objects, as flat spectrum radio quasars or as blazars of uncertain/transitional type. Each object is identified by a root name, coded as BZB, BZQ and BZU for these three subclasses respectively, and by its coordinates. This catalogue is being built as a tool useful for the identification of the extragalactic sources that will be detected by present and future experiments for X and gamma-ray astronomy, like Swift, AGILE, Fermi-GLAST and Simbol-X. An electronic version is available from the ASI Science Data Center web site at http://www.asdc.asi.it/bzcat.

Using RDF and Git to Realize a Collaborative Metadata Repository.

PubMed

Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas

2018-01-01

The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE PAGES

Dasari, Venkat R.; Humble, Travis S.

2016-10-10

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dasari, Venkat R.; Humble, Travis S.

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
Describing environmental public health data: implementing a descriptive metadata standard on the environmental public health tracking network.

PubMed

Patridge, Jeff; Namulanda, Gonza

2008-01-01

The Environmental Public Health Tracking (EPHT) Network provides an opportunity to bring together diverse environmental and health effects data by integrating}?> local, state, and national databases of environmental hazards, environmental exposures, and health effects. To help users locate data on the EPHT Network, the network will utilize descriptive metadata that provide critical information as to the purpose, location, content, and source of these data. Since 2003, the Centers for Disease Control and Prevention's EPHT Metadata Subgroup has been working to initiate the creation and use of descriptive metadata. Efforts undertaken by the group include the adoption of a metadata standard, creation of an EPHT-specific metadata profile, development of an open-source metadata creation tool, and promotion of the creation of descriptive metadata by changing the perception of metadata in the public health culture.
Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

PubMed

Dave, Jaydev K; Gingold, Eric L

2013-01-01

The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.
VizieR Online Data Catalog: SWASP catalogue of RR Lyrae stars (Greer+, 2017)

NASA Astrophysics Data System (ADS)

Greer, P. A.; Payne, S. G.; Norton, A. J.; Maxted, P. F. L.; Smalley, B.; West, P. J.; Wheatley, R. G.; Kolb, U. C.

2017-07-01

The SuperWASP RR Lyrae catalogue contains 4963 RRab type RR Lyrae objects from the SuperWASP archive. Each entry includes the unique SWASP identifier, pulsation period in days, pulsation amplitude in mags, median light curve amplitude in mags, the corresponding GCVS name, and corresponding SSS/CRTS names, where known. The SWASP blazhko candidate catalogue contains 1324 rows of Blazhko periods for 983 unique objects from the SWASP RRab catalogue. (2 data files).
The Value of Data and Metadata Standardization for Interoperability in Giovanni Or: Why Your Product's Metadata Causes Us Headaches!

NASA Technical Reports Server (NTRS)

Smit, Christine; Hegde, Mahabaleshwara; Strub, Richard; Bryant, Keith; Li, Angela; Petrenko, Maksym

2017-01-01

Giovanni is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization.
MMI's Metadata and Vocabulary Solutions: 10 Years and Growing

NASA Astrophysics Data System (ADS)

Graybeal, J.; Gayanilo, F.; Rueda-Velasquez, C. A.

2014-12-01

The Marine Metadata Interoperability project (http://marinemetadata.org) held its public opening at AGU's 2004 Fall Meeting. For 10 years since that debut, the MMI guidance and vocabulary sites have served over 100,000 visitors, with 525 community members and continuous Steering Committee leadership. Originally funded by the National Science Foundation, over the years multiple organizations have supported the MMI mission: "Our goal is to support collaborative research in the marine science domain, by simplifying the incredibly complex world of metadata into specific, straightforward guidance. MMI encourages scientists and data managers at all levels to apply good metadata practices from the start of a project, by providing the best guidance and resources for data management, and developing advanced metadata tools and services needed by the community." Now hosted by the Harte Research Institute at Texas A&M University at Corpus Christi, MMI continues to provide guidance and services to the community, and is planning for marine science and technology needs for the next 10 years. In this presentation we will highlight our major accomplishments, describe our recent achievements and imminent goals, and propose a vision for improving marine data interoperability for the next 10 years, including Ontology Registry and Repository (http://mmisw.org/orr) advancements and applications (http://mmisw.org/cfsn).
Advancements in Large-Scale Data/Metadata Management for Scientific Data.

NASA Astrophysics Data System (ADS)

Guntupally, K.; Devarakonda, R.; Palanisamy, G.; Frame, M. T.

2017-12-01

Scientific data often comes with complex and diverse metadata which are critical for data discovery and users. The Online Metadata Editor (OME) tool, which was developed by an Oak Ridge National Laboratory team, effectively manages diverse scientific datasets across several federal data centers, such as DOE's Atmospheric Radiation Measurement (ARM) Data Center and USGS's Core Science Analytics, Synthesis, and Libraries (CSAS&L) project. This presentation will focus mainly on recent developments and future strategies for refining OME tool within these centers. The ARM OME is a standard based tool (https://www.archive.arm.gov/armome) that allows scientists to create and maintain metadata about their data products. The tool has been improved with new workflows that help metadata coordinators and submitting investigators to submit and review their data more efficiently. The ARM Data Center's newly upgraded Data Discovery Tool (http://www.archive.arm.gov/discovery) uses rich metadata generated by the OME to enable search and discovery of thousands of datasets, while also providing a citation generator and modern order-delivery techniques like Globus (using GridFTP), Dropbox and THREDDS. The Data Discovery Tool also supports incremental indexing, which allows users to find new data as and when they are added. The USGS CSAS&L search catalog employs a custom version of the OME (https://www1.usgs.gov/csas/ome), which has been upgraded with high-level Federal Geographic Data Committee (FGDC) validations and the ability to reserve and mint Digital Object Identifiers (DOIs). The USGS's Science Data Catalog (SDC) (https://data.usgs.gov/datacatalog) allows users to discover a myriad of science data holdings through a web portal. Recent major upgrades to the SDC and ARM Data Discovery Tool include improved harvesting performance and migration using new search software, such as Apache Solr 6.0 for serving up data/metadata to scientific communities. Our presentation will highlight
Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

PubMed

Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

2016-09-15

Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.

PubMed

Good, Benjamin M; Tennis, Joseph T; Wilkinson, Mark D

2009-09-25

Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in
Towards a semantic medical Web: HealthCyberMap's tool for building an RDF metadata base of health information resources based on the Qualified Dublin Core Metadata Set.

PubMed

Boulos, Maged N; Roudsari, Abdul V; Carson, Ewart R

2002-07-01

HealthCyberMap (http://healthcybermap.semanticweb.org/) aims at mapping Internet health information resources in novel ways for enhanced retrieval and navigation. This is achieved by collecting appropriate resource metadata in an unambiguous form that preserves semantics. We modelled a qualified Dublin Core (DC) metadata set ontology with extra elements for resource quality and geographical provenance in Prot g -2000. A metadata collection form helps acquiring resource instance data within Prot g . The DC subject field is populated with UMLS terms directly imported from UMLS Knowledge Source Server using UMLS tab, a Prot g -2000 plug-in. The project is saved in RDFS/RDF. The ontology and associated form serve as a free tool for building and maintaining an RDF medical resource metadata base. The UMLS tab enables browsing and searching for concepts that best describe a resource, and importing them to DC subject fields. The resultant metadata base can be used with a search and inference engine, and have textual and/or visual navigation interface(s) applied to it, to ultimately build a medical Semantic Web portal. Different ways of exploiting Prot g -2000 RDF output are discussed. By making the context and semantics of resources, not merely their raw text and formatting, amenable to computer 'understanding,' we can build a Semantic Web that is more useful to humans than the current Web. This requires proper use of metadata and ontologies. Clinical codes can reliably describe the subjects of medical resources, establish the semantic relationships (as defined by underlying coding scheme) between related resources, and automate their topical categorisation.
Serious Games for Health: The Potential of Metadata.

PubMed

Göbel, Stefan; Maddison, Ralph

2017-02-01

Numerous serious games and health games exist, either as commercial products (typically with a focus on entertaining a broad user group) or smaller games and game prototypes, often resulting from research projects (typically tailored to a smaller user group with a specific health characteristic). A major drawback of existing health games is that they are not very well described and attributed with (machine-readable, quantitative, and qualitative) metadata such as the characterizing goal of the game, the target user group, or expected health effects well proven in scientific studies. This makes it difficult or even impossible for end users to find and select the most appropriate game for a specific situation (e.g., health needs). Therefore, the aim of this article was to motivate the need and potential/benefit of metadata for the description and retrieval of health games and to describe a descriptive model for the qualitative description of games for health. It was not the aim of the article to describe a stable, running system (portal) for health games. This will be addressed in future work. Building on previous work toward a metadata format for serious games, a descriptive model for the formal description of games for health is introduced. For the conceptualization of this model, classification schemata of different existing health game repositories are considered. The classification schema consists of three levels: a core set of mandatory descriptive fields relevant for all games for health application areas, a detailed level with more comprehensive, optional information about the games, and so-called extension as level three with specific descriptive elements relevant for dedicated health games application areas, for example, cardio training. A metadata format provides a technical framework to describe, find, and select appropriate health games matching the needs of the end user. Future steps to improve, apply, and promote the metadata format in the health games
A catalogue of photometric sequences (suppl. 3). [for astronomical photograph calibration

NASA Technical Reports Server (NTRS)

Argue, A. N.; Miller, E. W.; Warren, W. H., Jr.

1983-01-01

In stellar photometry studies, certain difficulties have arisen because of the lack of suitable photometric sequences for calibrating astronomical photographs. In order to eliminate these difficulties, active observers were contacted with a view to drawing up lists of suitable sequences. Replies from 63 authors offering data on 412 sequences were received. Most data were in the UBV system and had been obtained between 1968 and 1973. These were included in the original catalogue. The Catalogue represents a continuation of the earlier Photometric Catalogue compiled by Sharov and Jakimova (1970). A small supplement containing 69 sequences was issued in 1973. Supplement 2 was produced in 1976 and contained 320 sequences. Supplement 3 has now been compiled. It contains 1271 sequences.
The Planck Catalogue of Galactic Cold Clumps : PGCC

NASA Astrophysics Data System (ADS)

Montier, L.

The Planck satellite has provided an unprecedented view of the submm sky, allowing us to search for the dust emission of Galactic cold sources. Combining Planck-HFI all-sky maps in the high frequency channels with the IRAS map at 100um, we built the Planck catalogue of Galactic Cold Clumps (PGCC, Planck 2015 results. XXVIII), counting 13188 sources distributed over the whole sky, and following mainly the Galactic structures at low and intermediate latitudes. This is the first all-sky catalogue of Galactic cold sources obtained with a single instrument at this resolution and sensitivity, which opens a new window on star-formation processes in our Galaxy.
The Belgian Union Catalogue of Periodicals

ERIC Educational Resources Information Center

Goedeme, G.; And Others

1976-01-01

Describes the edition, on computer output microfiche, of the supplement to the 1965 Union catalogue of foreign periodicals in Belgian and Luxemburgian libraries and documentation centers. The microfiches contain location information of 28,000 periodicals in 300 libraries and are edited in a rich typography. (Author)
Improving Earth Science Metadata: Modernizing ncISO

NASA Astrophysics Data System (ADS)

O'Brien, K.; Schweitzer, R.; Neufeld, D.; Burger, E. F.; Signell, R. P.; Arms, S. C.; Wilcox, K.

2016-12-01

ncISO is a package of tools developed at NOAA's National Center for Environmental Information (NCEI) that facilitates the generation of ISO 19115-2 metadata from NetCDF data sources. The tool currently exists in two iterations: a command line utility and a web-accessible service within the THREDDS Data Server (TDS). Several projects, including NOAA's Unified Access Framework (UAF), depend upon ncISO to generate the ISO-compliant metadata from their data holdings and use the resulting information to populate discovery tools such as NCEI's ESRI Geoportal and NOAA's data.noaa.gov CKAN system. In addition to generating ISO 19115-2 metadata, the tool calculates a rubric score based on how well the dataset follows the Attribute Conventions for Dataset Discovery (ACDD). The result of this rubric calculation, along with information about what has been included and what is missing is displayed in an HTML document generated by the ncISO software package. Recently ncISO has fallen behind in terms of supporting updates to conventions such updates to the ACDD. With the blessing of the original programmer, NOAA's UAF has been working to modernize the ncISO software base. In addition to upgrading ncISO to utilize version1.3 of the ACDD, we have been working with partners at Unidata and IOOS to unify the tool's code base. In essence, we are merging the command line capabilities into the same software that will now be used by the TDS service, allowing easier updates when conventions such as ACDD are updated in the future. In this presentation, we will discuss the work the UAF project has done to support updated conventions within ncISO, as well as describe how the updated tool is helping to improve metadata throughout the earth and ocean sciences.
Solutions for extracting file level spatial metadata from airborne mission data

NASA Astrophysics Data System (ADS)

Schwab, M. J.; Stanley, M.; Pals, J.; Brodzik, M.; Fowler, C.; Icebridge Engineering/Spatial Metadata

2011-12-01

Authors: Michael Stanley Mark Schwab Jon Pals Mary J. Brodzik Cathy Fowler Collaboration: Raytheon EED and NSIDC Raytheon / EED 5700 Rivertech Court Riverdale, MD 20737 NSIDC University of Colorado UCB 449 Boulder, CO 80309-0449 Data sets acquired from satellites and aircraft may differ in many ways. We will focus on the differences in spatial coverage between the two platforms. Satellite data sets over a given period typically cover large geographic regions. These data are collected in a consistent, predictable and well understood manner due to the uniformity of satellite orbits. Since satellite data collection paths are typically smooth and uniform the data from satellite instruments can usually be described with simple spatial metadata. Subsequently, these spatial metadata can be stored and searched easily and efficiently. Conversely, aircraft have significantly more freedom to change paths, circle, overlap, and vary altitude all of which add complexity to the spatial metadata. Aircraft are also subject to wind and other elements that result in even more complicated and unpredictable spatial coverage areas. This unpredictability and complexity makes it more difficult to extract usable spatial metadata from data sets collected on aircraft missions. It is not feasible to use all of the location data from aircraft mission data sets for use as spatial metadata. The number of data points in typical data sets poses serious performance problems for spatial searching. In order to provide efficient spatial searching of the large number of files cataloged in our systems, we need to extract approximate spatial descriptions as geo-polygons from a small number of vertices (fewer than two hundred). We present some of the challenges and solutions for creating airborne mission-derived spatial metadata. We are implementing these methods to create the spatial metadata for insertion of IceBridge mission data into ECS for public access through NSIDC and ECHO but, they are
CruiseViewer: SIOExplorer Graphical Interface to Metadata and Archives.

NASA Astrophysics Data System (ADS)

Sutton, D. W.; Helly, J. J.; Miller, S. P.; Chase, A.; Clark, D.

2002-12-01

We are introducing "CruiseViewer" as a prototype graphical interface for the SIOExplorer digital library project, part of the overall NSF National Science Digital Library (NSDL) effort. When complete, CruiseViewer will provide access to nearly 800 cruises, as well as 100 years of documents and images from the archives of the Scripps Institution of Oceanography (SIO). The project emphasizes data object accessibility, a rich metadata format, efficient uploading methods and interoperability with other digital libraries. The primary function of CruiseViewer is to provide a human interface to the metadata database and to storage systems filled with archival data. The system schema is based on the concept of an "arbitrary digital object" (ADO). Arbitrary in that if the object can be stored on a computer system then SIOExplore can manage it. Common examples are a multibeam swath bathymetry file, a .pdf cruise report, or a tar file containing all the processing scripts used on a cruise. We require a metadata file for every ADO in an ascii "metadata interchange format" (MIF), which has proven to be highly useful for operability and extensibility. Bulk ADO storage is managed using the Storage Resource Broker, SRB, data handling middleware developed at the San Diego Supercomputer Center that centralizes management and access to distributed storage devices. MIF metadata are harvested from several sources and housed in a relational (Oracle) database. For CruiseViewer, cgi scripts resident on an Apache server are the primary communication and service request handling tools. Along with the CruiseViewer java application, users can query, access and download objects via a separate method that operates through standard web browsers, http://sioexplorer.ucsd.edu. Both provide the functionability to query and view object metadata, and select and download ADOs. For the CruiseViewer application Java 2D is used to add a geo-referencing feature that allows users to select basemap images
International Metadata Standards and Enterprise Data Quality Metadata Systems

NASA Technical Reports Server (NTRS)

Habermann, Ted

2016-01-01

Well-documented data quality is critical in situations where scientists and decision-makers need to combine multiple datasets from different disciplines and collection systems to address scientific questions or difficult decisions. Standardized data quality metadata could be very helpful in these situations. Many efforts at developing data quality standards falter because of the diversity of approaches to measuring and reporting data quality. The one size fits all paradigm does not generally work well in this situation. I will describe these and other capabilities of ISO 19157 with examples of how they are being used to describe data quality across the NASA EOS Enterprise and also compare these approaches with other standards.

Metadata from data: identifying holidays from anesthesia data.

PubMed

Starnes, Joseph R; Wanderer, Jonathan P; Ehrenfeld, Jesse M

2015-05-01

The increasingly large databases available to researchers necessitate high-quality metadata that is not always available. We describe a method for generating this metadata independently. Cluster analysis and expectation-maximization were used to separate days into holidays/weekends and regular workdays using anesthesia data from Vanderbilt University Medical Center from 2004 to 2014. This classification was then used to describe differences between the two sets of days over time. We evaluated 3802 days and correctly categorized 3797 based on anesthesia case time (representing an error rate of 0.13%). Use of other metrics for categorization, such as billed anesthesia hours and number of anesthesia cases per day, led to similar results. Analysis of the two categories showed that surgical volume increased more quickly with time for non-holidays than holidays (p < 0.001). We were able to successfully generate metadata from data by distinguishing holidays based on anesthesia data. This data can then be used for economic analysis and scheduling purposes. It is possible that the method can be expanded to similar bimodal and multimodal variables.
Turning Data into Information: Assessing and Reporting GIS Metadata Integrity Using Integrated Computing Technologies

ERIC Educational Resources Information Center

Mulrooney, Timothy J.

2009-01-01

A Geographic Information System (GIS) serves as the tangible and intangible means by which spatially related phenomena can be created, analyzed and rendered. GIS metadata serves as the formal framework to catalog information about a GIS data set. Metadata is independent of the encoded spatial and attribute information. GIS metadata is a subset of…
Analyzing handwriting biometrics in metadata context

NASA Astrophysics Data System (ADS)

Scheidat, Tobias; Wolf, Franziska; Vielhauer, Claus

2006-02-01

In this article, methods for user recognition by online handwriting are experimentally analyzed using a combination of demographic data of users in relation to their handwriting habits. Online handwriting as a biometric method is characterized by having high variations of characteristics that influences the reliance and security of this method. These variations have not been researched in detail so far. Especially in cross-cultural application it is urgent to reveal the impact of personal background to security aspects in biometrics. Metadata represent the background of writers, by introducing cultural, biological and conditional (changing) aspects like fist language, country of origin, gender, handedness, experiences the influence handwriting and language skills. The goal is the revelation of intercultural impacts on handwriting in order to achieve higher security in biometrical systems. In our experiments, in order to achieve a relatively high coverage, 48 different handwriting tasks have been accomplished by 47 users from three countries (Germany, India and Italy) have been investigated with respect to the relations of metadata and biometric recognition performance. For this purpose, hypotheses have been formulated and have been evaluated using the measurement of well-known recognition error rates from biometrics. The evaluation addressed both: system reliance and security threads by skilled forgeries. For the later purpose, a novel forgery type is introduced, which applies the personal metadata to security aspects and includes new methods of security tests. Finally in our paper, we formulate recommendations for specific user groups and handwriting samples.
Data Service: Distributed Data Capture and Replication

NASA Astrophysics Data System (ADS)

Warner, P. B.; Pietrowicz, S. R.

2007-10-01

Data Service is a critical component of the NOAO Data Management and Science Support (DMaSS) Solutions Platform, which is based on a service-oriented architecture, and is to replace the current NOAO Data Transport System. Its responsibilities include capturing data from NOAO and partner telescopes and instruments and replicating the data across multiple (currently six) storage sites. Java 5 was chosen as the implementation language, and Java EE as the underlying enterprise framework. Application metadata persistence is performed using EJB and Hibernate on the JBoss Application Server, with PostgreSQL as the persistence back-end. Although potentially any underlying mass storage system may be used as the Data Service file persistence technology, DTS deployments and Data Service test deployments currently use the Storage Resource Broker from SDSC. This paper presents an overview and high-level design of the Data Service, including aspects of deployment, e.g., for the LSST Data Challenge at the NCSA computing facilities.
Improvements to the Ontology-based Metadata Portal for Unified Semantics (OlyMPUS)

NASA Astrophysics Data System (ADS)

Linsinbigler, M. A.; Gleason, J. L.; Huffer, E.

2016-12-01

The Ontology-based Metadata Portal for Unified Semantics (OlyMPUS), funded by the NASA Earth Science Technology Office Advanced Information Systems Technology program, is an end-to-end system designed to support Earth Science data consumers and data providers, enabling the latter to register data sets and provision them with the semantically rich metadata that drives the Ontology-Driven Interactive Search Environment for Earth Sciences (ODISEES). OlyMPUS complements the ODISEES' data discovery system with an intelligent tool to enable data producers to auto-generate semantically enhanced metadata and upload it to the metadata repository that drives ODISEES. Like ODISEES, the OlyMPUS metadata provisioning tool leverages robust semantics, a NoSQL database and query engine, an automated reasoning engine that performs first- and second-order deductive inferencing, and uses a controlled vocabulary to support data interoperability and automated analytics. The ODISEES data discovery portal leverages this metadata to provide a seamless data discovery and access experience for data consumers who are interested in comparing and contrasting the multiple Earth science data products available across NASA data centers. Olympus will support scientists' services and tools for performing complex analyses and identifying correlations and non-obvious relationships across all types of Earth System phenomena using the full spectrum of NASA Earth Science data available. By providing an intelligent discovery portal that supplies users - both human users and machines - with detailed information about data products, their contents and their structure, ODISEES will reduce the level of effort required to identify and prepare large volumes of data for analysis. This poster will explain how OlyMPUS leverages deductive reasoning and other technologies to create an integrated environment for generating and exploiting semantically rich metadata.
Predicting age groups of Twitter users based on language and metadata features.

PubMed

Morgan-Lopez, Antonio A; Kim, Annice E; Chew, Robert F; Ruddle, Paul

2017-01-01

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles' metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen's d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as "school" for youth and "college" for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be helpful for
NERIES: Seismic Data Gateways and User Composed Datasets Metadata Management

NASA Astrophysics Data System (ADS)

Spinuso, Alessandro; Trani, Luca; Kamb, Linus; Frobert, Laurent

2010-05-01

One of the NERIES EC project main objectives is to establish and improve the networking of seismic waveform data exchange and access among four main data centers in Europe: INGV, GFZ, ORFEUS and IPGP. Besides the implementation of the data backbone, several investigations and developments have been conducted in order to offer to the users the data available from this network, either programmatically or interactively. One of the challenges is to understand how to enable users` activities such as discovering, aggregating, describing and sharing datasets to obtain a decrease in the replication of similar data queries towards the network, exempting the data centers to guess and create useful pre-packed products. We`ve started to transfer this task more and more towards the users community, where the users` composed data products could be extensively re-used. The main link to the data is represented by a centralized webservice (SeismoLink) acting like a single access point to the whole data network. Users can download either waveform data or seismic station inventories directly from their own software routines by connecting to this webservice, which routes the request to the data centers. The provenance of the data is maintained and transferred to the users in the form of URIs, that identify the dataset and implicitly refer to the data provider. SeismoLink, combined with other webservices (eg EMSC-QuakeML earthquakes catalog service), is used from a community gateway such as the NERIES web portal (http://www.seismicportal.eu). Here the user interacts with a map based portlet which allows the dynamic composition of a data product, binding seismic event`s parameters with a set of seismic stations. The requested data is collected by the back-end processes of the portal, preserved and offered to the user in a personal data cart, where metadata can be generated interactively on-demand. The metadata, expressed in RDF, can also be remotely ingested. They offer rating
[Radiological dose and metadata management].

PubMed

Walz, M; Kolodziej, M; Madsack, B

2016-12-01

This article describes the features of management systems currently available in Germany for extraction, registration and evaluation of metadata from radiological examinations, particularly in the digital imaging and communications in medicine (DICOM) environment. In addition, the probable relevant developments in this area concerning radiation protection legislation, terminology, standardization and information technology are presented.
Automated Construction of Coverage Catalogues of Aster Satellite Image for Urban Areas of the World

NASA Astrophysics Data System (ADS)

Miyazaki, H.; Iwao, K.; Shibasaki, R.

2012-07-01

We developed an algorithm to determine a combination of satellite images according to observation extent and image quality. The algorithm was for testing necessity for completing coverage of the search extent. The tests excluded unnecessary images with low quality and preserve necessary images with good quality. The search conditions of the satellite images could be extended, indicating the catalogue could be constructed with specified periods required for time series analysis. We applied the method to a database of metadata of ASTER satellite images archived in GEO Grid of National Institute of Advanced Industrial Science and Technology (AIST), Japan. As indexes of populated places with geographical coordinates, we used a database of 3372 populated place of more than 0.1 million populations retrieved from GRUMP Settlement Points, a global gazetteer of cities, which has geographical names of populated places associated with geographical coordinates and population data. From the coordinates of populated places, 3372 extents were generated with radiuses of 30 km, a half of swath of ASTER satellite images. By merging extents overlapping each other, they were assembled into 2214 extents. As a result, we acquired combinations of good quality for 1244 extents, those of low quality for 96 extents, incomplete combinations for 611 extents. Further improvements would be expected by introducing pixel-based cloud assessment and pixel value correction over seasonal variations.
An Assistant for Loading Learning Object Metadata: An Ontology Based Approach

ERIC Educational Resources Information Center

Casali, Ana; Deco, Claudia; Romano, Agustín; Tomé, Guillermo

2013-01-01

In the last years, the development of different Repositories of Learning Objects has been increased. Users can retrieve these resources for reuse and personalization through searches in web repositories. The importance of high quality metadata is key for a successful retrieval. Learning Objects are described with metadata usually in the standard…
Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avila, S.; et al.

Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a $2LPT$ density field with an exponential bias. For each of them, a lightconemore » is constructed by the superposition of snapshots in the redshift range $0.45« less
A metadata schema for data objects in clinical research.

PubMed

Canham, Steve; Ohmann, Christian

2016-11-24

A large number of stakeholders have accepted the need for greater transparency in clinical research and, in the context of various initiatives and systems, have developed a diverse and expanding number of repositories for storing the data and documents created by clinical studies (collectively known as data objects). To make the best use of such resources, we assert that it is also necessary for stakeholders to agree and deploy a simple, consistent metadata scheme. The relevant data objects and their likely storage are described, and the requirements for metadata to support data sharing in clinical research are identified. Issues concerning persistent identifiers, for both studies and data objects, are explored. A scheme is proposed that is based on the DataCite standard, with extensions to cover the needs of clinical researchers, specifically to provide (a) study identification data, including links to clinical trial registries; (b) data object characteristics and identifiers; and (c) data covering location, ownership and access to the data object. The components of the metadata scheme are described. The metadata schema is proposed as a natural extension of a widely agreed standard to fill a gap not tackled by other standards related to clinical research (e.g., Clinical Data Interchange Standards Consortium, Biomedical Research Integrated Domain Group). The proposal could be integrated with, but is not dependent on, other moves to better structure data in clinical research.
ATLAS Metadata Infrastructure Evolution for Run 2 and Beyond

NASA Astrophysics Data System (ADS)

van Gemmeren, P.; Cranshaw, J.; Malon, D.; Vaniachine, A.

2015-12-01

ATLAS developed and employed for Run 1 of the Large Hadron Collider a sophisticated infrastructure for metadata handling in event processing jobs. This infrastructure profits from a rich feature set provided by the ATLAS execution control framework, including standardized interfaces and invocation mechanisms for tools and services, segregation of transient data stores with concomitant object lifetime management, and mechanisms for handling occurrences asynchronous to the control framework's state machine transitions. This metadata infrastructure is evolving and being extended for Run 2 to allow its use and reuse in downstream physics analyses, analyses that may or may not utilize the ATLAS control framework. At the same time, multiprocessing versions of the control framework and the requirements of future multithreaded frameworks are leading to redesign of components that use an incident-handling approach to asynchrony. The increased use of scatter-gather architectures, both local and distributed, requires further enhancement of metadata infrastructure in order to ensure semantic coherence and robust bookkeeping. This paper describes the evolution of ATLAS metadata infrastructure for Run 2 and beyond, including the transition to dual-use tools—tools that can operate inside or outside the ATLAS control framework—and the implications thereof. It further examines how the design of this infrastructure is changing to accommodate the requirements of future frameworks and emerging event processing architectures.
Transport Infrastructure in the Process of Cataloguing Brownfields

NASA Astrophysics Data System (ADS)

Kramářová, Zuzana

2017-10-01

To begin with, the identification and follow-up revitalisation of brownfields raises a burning issue in territorial planning as well as in construction engineering. This phenomenon occurs not only in the Czech Republic and Europe, but also world-wide experts conduct its careful investigation. These issues may be divided into several areas. First, it is identifying and cataloguing single territorial localities; next, it means a complex process of locality revitalisation. As a matter of fact, legislative framework represents a separate area, which is actually highly specific in individual countries in accordance with the existing law, norms and regulations (it concerns mainly territorial planning and territory segmentation into appropriate administrative units). Legislative base of the Czech Republic was analysed in an article at WMCAUS in 2016. The solution of individual identification and following cataloguing of brownfields is worked out by Form of Regional Studies within the Legislation of the Czech Republic. Due to huge the scale of issues to be tackled, their content is only loosely defined in regard to Building Act and its implementing regulations, e.g. examining the layout of future construction in the area, locating architecturally or otherwise interesting objects, transport or technical infrastructure management, tourism, socially excluded localities etc. Legislative base does not exist, there is no common method for identifying and cataloguing brownfields. Therefore, individual catalogue lists are subject to customer’s requirements. All the same, the relevant information which the database contains may be always examined. One of them is part about transport infrastructure. The information may be divided into three subareas - information on transport accessibility of the locality, information on the actual infrastructure in the locality and information on the transport accessibility of human resources.
Metadata-Driven SOA-Based Application for Facilitation of Real-Time Data Warehousing

NASA Astrophysics Data System (ADS)

Pintar, Damir; Vranić, Mihaela; Skočir, Zoran

Service-oriented architecture (SOA) has already been widely recognized as an effective paradigm for achieving integration of diverse information systems. SOA-based applications can cross boundaries of platforms, operation systems and proprietary data standards, commonly through the usage of Web Services technology. On the other side, metadata is also commonly referred to as a potential integration tool given the fact that standardized metadata objects can provide useful information about specifics of unknown information systems with which one has interest in communicating with, using an approach commonly called "model-based integration". This paper presents the result of research regarding possible synergy between those two integration facilitators. This is accomplished with a vertical example of a metadata-driven SOA-based business process that provides ETL (Extraction, Transformation and Loading) and metadata services to a data warehousing system in need of a real-time ETL support.
The ground truth about metadata and community detection in networks

PubMed Central

Peel, Leto; Larremore, Daniel B.; Clauset, Aaron

2017-01-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. PMID:28508065
The ground truth about metadata and community detection in networks.

PubMed

Peel, Leto; Larremore, Daniel B; Clauset, Aaron

2017-05-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
A Digital Broadcast Item (DBI) enabling metadata repository for digital, interactive television (digiTV) feedback channel networks

NASA Astrophysics Data System (ADS)

Lugmayr, Artur R.; Mailaparampil, Anurag; Tico, Florina; Kalli, Seppo; Creutzburg, Reiner

2003-01-01

Digital television (digiTV) is an additional multimedia environment, where metadata is one key element for the description of arbitrary content. This implies adequate structures for content description, which is provided by XML metadata schemes (e.g. MPEG-7, MPEG-21). Content and metadata management is the task of a multimedia repository, from which digiTV clients - equipped with an Internet connection - can access rich additional multimedia types over an "All-HTTP" protocol layer. Within this research work, we focus on conceptual design issues of a metadata repository for the storage of metadata, accessible from the feedback channel of a local set-top box. Our concept describes the whole heterogeneous life-cycle chain of XML metadata from the service provider to the digiTV equipment, device independent representation of content, accessing and querying the metadata repository, management of metadata related to digiTV, and interconnection of basic system components (http front-end, relational database system, and servlet container). We present our conceptual test configuration of a metadata repository that is aimed at a real-world deployment, done within the scope of the future interaction (fiTV) project at the Digital Media Institute (DMI) Tampere (www.futureinteraction.tv).
Incorporating clinical metadata with digital image features for automated identification of cutaneous melanoma.

PubMed

Liu, Z; Sun, J; Smith, M; Smith, L; Warr, R

2013-11-01

Computer-assisted diagnosis (CAD) of malignant melanoma (MM) has been advocated to help clinicians to achieve a more objective and reliable assessment. However, conventional CAD systems examine only the features extracted from digital photographs of lesions. Failure to incorporate patients' personal information constrains the applicability in clinical settings. To develop a new CAD system to improve the performance of automatic diagnosis of melanoma, which, for the first time, incorporates digital features of lesions with important patient metadata into a learning process. Thirty-two features were extracted from digital photographs to characterize skin lesions. Patients' personal information, such as age, gender and, lesion site, and their combinations, was quantified as metadata. The integration of digital features and metadata was realized through an extended Laplacian eigenmap, a dimensionality-reduction method grouping lesions with similar digital features and metadata into the same classes. The diagnosis reached 82.1% sensitivity and 86.1% specificity when only multidimensional digital features were used, but improved to 95.2% sensitivity and 91.0% specificity after metadata were incorporated appropriately. The proposed system achieves a level of sensitivity comparable with experienced dermatologists aided by conventional dermoscopes. This demonstrates the potential of our method for assisting clinicians in diagnosing melanoma, and the benefit it could provide to patients and hospitals by greatly reducing unnecessary excisions of benign naevi. This paper proposes an enhanced CAD system incorporating clinical metadata into the learning process for automatic classification of melanoma. Results demonstrate that the additional metadata and the mechanism to incorporate them are useful for improving CAD of melanoma. © 2013 British Association of Dermatologists.
The accuracy of the measurements in Ulugh Beg's star catalogue

NASA Astrophysics Data System (ADS)

Krisciunas, K.

1992-12-01

The star catalogue compiled by Ulugh Beg and his collaborators in Samarkand (ca. 1437) is the only catalogue primarily based on original observations between the times of Ptolemy and Tycho Brahe. Evans (1987) has given convincing evidence that Ulugh Beg's star catalogue was based on measurements made with a zodiacal armillary sphere graduated to 15(') , with interpolation to 0.2 units. He and Shevchenko (1990) were primarily interested in the systematic errors in ecliptic longitude. Shevchenko's analysis of the random errors was limited to the twelve zodiacal constellations. We have analyzed all 843 ecliptic longitudes and latitudes attributed to Ulugh Beg by Knobel (1917). This required multiplying all the longitude errors by the respective values of the cosine of the celestial latitudes. We find a random error of +/- 17minp 7 for ecliptic longitude and +/- 16minp 5 for ecliptic latitude. On the whole, the random errors are largest near the ecliptic, decreasing towards the ecliptic poles. For all of Ulugh Beg's measurements (excluding outliers) the mean systematic error is -10minp 8 +/- 0minp 8 for ecliptic longitude and 7minp 5 +/- 0minp 7 for ecliptic latitude, with the errors in the sense ``computed minus Ulugh Beg''. For the brighter stars (those designated alpha , beta , and gamma in the respective constellations), the mean systematic errors are -11minp 3 +/- 1minp 9 for ecliptic longitude and 9minp 4 +/- 1minp 5 for ecliptic latitude. Within the errors this matches the systematic error in both coordinates for alpha Vir. With greater confidence we may conclude that alpha Vir was the principal reference star in the catalogues of Ulugh Beg and Ptolemy. Evans, J. 1987, J. Hist. Astr. 18, 155. Knobel, E. B. 1917, Ulugh Beg's Catalogue of Stars, Washington, D. C.: Carnegie Institution. Shevchenko, M. 1990, J. Hist. Astr. 21, 187.

Principles of metadata organization at the ENCODE data coordination center

PubMed Central

Hong, Eurie L.; Sloan, Cricket A.; Chan, Esther T.; Davidson, Jean M.; Malladi, Venkat S.; Strattan, J. Seth; Hitz, Benjamin C.; Gabdank, Idan; Narayanan, Aditi K.; Ho, Marcus; Lee, Brian T.; Rowe, Laurence D.; Dreszer, Timothy R.; Roe, Greg R.; Podduturi, Nikhil R.; Tanaka, Forrest; Hilton, Jason A.; Cherry, J. Michael

2016-01-01

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org PMID:26980513
Northern Hemisphere observations of ICRF sources on the USNO stellar catalogue frame

NASA Astrophysics Data System (ADS)

Fienga, A.; Andrei, A. H.

2004-06-01

The most recent USNO stellar catalogue, the USNO B1.0 (Monet et al. \\cite{Monet03}), provides positions for 1 042 618 261 objects, with a published astrometric accuracy of 200 mas and five-band magnitudes with a 0.3 mag accuracy. Its completeness is believed to be up to magnitude 21th in V-band. Such a catalogue would be a very good tool for astrometric reduction. This work investigates the accuracy of the USNO B1.0 link to ICRF and give an estimation of its internal and external accuracies by comparison with different catalogues, and by computation of ICRF sources using USNO B1.0 star positions.
Interactive Tools to Access the HELCATS Catalogues

NASA Astrophysics Data System (ADS)

Rouillard, Alexis; Plotnikov, Illya; Pinto, Rui; Génot, Vincent; Bouchemit, Myriam; Davies, Jackie

2017-04-01

The propagation tool is a web-based interface written in java that allows users to propagate Coronal Mass Ejections (CMEs), Corotating Interaction Regions (CIRs) and Solar Energetic Particles (SEPs) in the inner heliosphere. The tool displays unique datasets and catalogues through a 2-D visualisation of the trajectories of these heliospheric structures in relation to the orbital position of probes/planets and the pointing direction and extent of different imaging instruments. Summary plots of in-situ data or images of the solar corona and planetary aurorae stored at the CDPP, MEDOC and APIS databases, respectively, can be used to verify the presence of heliospheric structures at the estimated launch or impact times. A great novelty of the tool is the immediate visualisation of J-maps and the possibility to superpose on these maps the HELCATS CME and CIR catalogues.
Interactive Tools to Access the HELCATS Catalogues

NASA Astrophysics Data System (ADS)

Rouillard, A.; Génot, V.; Bouchemit, M.; Pinto, R.

2017-09-01

The propagation tool is a web-based interface written in java that allows users to propagate Coronal Mass Ejections (CMEs), Corotating Interaction Regions (CIRs) and Solar Energetic Particles (SEPs) in the inner heliosphere. The tool displays unique datasets and catalogues through a 2-D visualisation of the trajectories of these heliospheric structures in relation to the orbital position of probes/planets and the pointing direction and extent of different imaging instruments. Summary plots of in-situ data or images of the solar corona and planetary aurorae stored at the CDPP, MEDOC and APIS databases, respectively, can be used to verify the presence of heliospheric structures at the estimated launch or impact times. A great novelty of the tool is the immediate visualisation of J-maps and the possibility to superpose on these maps the HELCATS CME and CIR catalogues.
Dark Energy Survey Year 1 Results: galaxy mock catalogues for BAO

NASA Astrophysics Data System (ADS)

Avila, S.; Crocce, M.; Ross, A. J.; García-Bellido, J.; Percival, W. J.; Banik, N.; Camacho, H.; Kokron, N.; Chan, K. C.; Andrade-Oliveira, F.; Gomes, R.; Gomes, D.; Lima, M.; Rosenfeld, R.; Salvador, A. I.; Friedrich, O.; Abdalla, F. B.; Annis, J.; Benoit-Lévy, A.; Bertin, E.; Brooks, D.; Carrasco Kind, M.; Carretero, J.; Castander, F. J.; Cunha, C. E.; da Costa, L. N.; Davis, C.; De Vicente, J.; Doel, P.; Fosalba, P.; Frieman, J.; Gerdes, D. W.; Gruen, D.; Gruendl, R. A.; Gutierrez, G.; Hartley, W. G.; Hollowood, D.; Honscheid, K.; James, D. J.; Kuehn, K.; Kuropatkin, N.; Miquel, R.; Plazas, A. A.; Sanchez, E.; Scarpine, V.; Schindler, R.; Schubnell, M.; Sevilla-Noarbe, I.; Smith, M.; Sobreira, F.; Suchyta, E.; Swanson, M. E. C.; Tarle, G.; Thomas, D.; Walker, A. R.; Dark Energy Survey Collaboration

2018-05-01

Mock catalogues are a crucial tool in the analysis of galaxy surveys data, both for the accurate computation of covariance matrices, and for the optimisation of analysis methodology and validation of data sets. In this paper, we present a set of 1800 galaxy mock catalogues designed to match the Dark Energy Survey Year-1 BAO sample (Crocce et al. 2017) in abundance, observational volume, redshift distribution and uncertainty, and redshift dependent clustering. The simulated samples were built upon HALOGEN (Avila et al. 2015) halo catalogues, based on a 2LPT density field with an empirical halo bias. For each of them, a lightcone is constructed by the superposition of snapshots in the redshift range 0.45 < z < 1.4. Uncertainties introduced by so-called photometric redshifts estimators were modelled with a double-skewed-Gaussian curve fitted to the data. We populate halos with galaxies by introducing a hybrid Halo Occupation Distribution - Halo Abundance Matching model with two free parameters. These are adjusted to achieve a galaxy bias evolution b(zph) that matches the data at the 1-σ level in the range 0.6 < zph < 1.0. We further analyse the galaxy mock catalogues and compare their clustering to the data using the angular correlation function w(θ), the comoving transverse separation clustering ξμ < 0.8(s⊥) and the angular power spectrum Cℓ, finding them in agreement. This is the first large set of three-dimensional {ra,dec,z} galaxy mock catalogues able to simultaneously accurately reproduce the photometric redshift uncertainties and the galaxy clustering.
Cleaning by clustering: methodology for addressing data quality issues in biomedical metadata.

PubMed

Hu, Wei; Zaveri, Amrapali; Qiu, Honglei; Dumontier, Michel

2017-09-18

The ability to efficiently search and filter datasets depends on access to high quality metadata. While most biomedical repositories require data submitters to provide a minimal set of metadata, some such as the Gene Expression Omnibus (GEO) allows users to specify additional metadata in the form of textual key-value pairs (e.g. sex: female). However, since there is no structured vocabulary to guide the submitter regarding the metadata terms to use, consequently, the 44,000,000+ key-value pairs in GEO suffer from numerous quality issues including redundancy, heterogeneity, inconsistency, and incompleteness. Such issues hinder the ability of scientists to hone in on datasets that meet their requirements and point to a need for accurate, structured and complete description of the data. In this study, we propose a clustering-based approach to address data quality issues in biomedical, specifically gene expression, metadata. First, we present three different kinds of similarity measures to compare metadata keys. Second, we design a scalable agglomerative clustering algorithm to cluster similar keys together. Our agglomerative cluster algorithm identified metadata keys that were similar, based on (i) name, (ii) core concept and (iii) value similarities, to each other and grouped them together. We evaluated our method using a manually created gold standard in which 359 keys were grouped into 27 clusters based on six types of characteristics: (i) age, (ii) cell line, (iii) disease, (iv) strain, (v) tissue and (vi) treatment. As a result, the algorithm generated 18 clusters containing 355 keys (four clusters with only one key were excluded). In the 18 clusters, there were keys that were identified correctly to be related to that cluster, but there were 13 keys which were not related to that cluster. We compared our approach with four other published methods. Our approach significantly outperformed them for most metadata keys and achieved the best average F-Score (0
Planck 2013 results. XXXII. The updated Planck catalogue of Sunyaev-Zeldovich sources

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Armitage-Caplan, C.; Arnaud, M.; Ashdown, M.; Atrio-Barandela, F.; Aumont, J.; Aussel, H.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Barrena, R.; Bartelmann, M.; Bartlett, J. G.; Battaner, E.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bikmaev, I.; Bobin, J.; Bock, J. J.; Böhringer, H.; Bonaldi, A.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Bridges, M.; Bucher, M.; Burenin, R.; Burigana, C.; Butler, R. C.; Cardoso, J.-F.; Carvalho, P.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chen, X.; Chiang, H. C.; Chiang, L.-Y.; Chon, G.; Christensen, P. R.; Churazov, E.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Comis, B.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Da Silva, A.; Dahle, H.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Delouis, J.-M.; Démoclès, J.; Désert, F.-X.; Dickinson, C.; Diego, J. M.; Dolag, K.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Dupac, X.; Efstathiou, G.; Enßlin, T. A.; Eriksen, H. K.; Feroz, F.; Ferragamo, A.; Finelli, F.; Flores-Cacho, I.; Forni, O.; Frailis, M.; Franceschi, E.; Fromenteau, S.; Galeotta, S.; Ganga, K.; Génova-Santos, R. T.; Giard, M.; Giardino, G.; Gilfanov, M.; Giraud-Héraud, Y.; González-Nuevo, J.; Górski, K. M.; Grainge, K. J. B.; Gratton, S.; Gregorio, A.; Groeneboom, N., E.; Gruppuso, A.; Hansen, F. K.; Hanson, D.; Harrison, D.; Hempel, A.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Hurley-Walker, N.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Khamitov, I.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Knox, L.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Laureijs, R. J.; Lawrence, C. R.; Leahy, J. P.; Leonardi, R.; León-Tavares, J.; Lesgourgues, J.; Li, C.; Liddle, A.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; MacTavish, C. J.; Maffei, B.; Maino, D.; Mandolesi, N.; Maris, M.; Marshall, D. J.; Martin, P. G.; Martínez-González, E.; Masi, S.; Massardi, M.; Matarrese, S.; Matthai, F.; Mazzotta, P.; Mei, S.; Meinhold, P. R.; Melchiorri, A.; Melin, J.-B.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mikkelsen, K.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nastasi, A.; Nati, F.; Natoli, P.; Nesvadba, N. P. H.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; O'Dwyer, I. J.; Olamaie, M.; Osborne, S.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paoletti, D.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrott, Y. C.; Perrotta, F.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Ponthieu, N.; Popa, L.; Poutanen, T.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Reach, W. T.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Ricciardi, S.; Riller, T.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Roudier, G.; Rowan-Robinson, M.; Rubiño-Martín, J. A.; Rumsey, C.; Rusholme, B.; Sandri, M.; Santos, D.; Saunders, R. D. E.; Savini, G.; Schammel, M. P.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Shimwell, T. W.; Spencer, L. D.; Starck, J.-L.; Stolyarov, V.; Stompor, R.; Streblyanska, A.; Sudiwala, R.; Sunyaev, R.; Sureau, F.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Tavagnacco, D.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tramonte, D.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vibert, L.; Vielva, P.; Villa, F.; Vittorio, N.; Wade, L. A.; Wandelt, B. D.; White, M.; White, S. D. M.; Yvon, D.; Zacchei, A.; Zonca, A.

2015-09-01

We update the all-sky Planck catalogue of 1227 clusters and cluster candidates (PSZ1) published in March 2013, derived from detections of the Sunyaev-Zeldovich (SZ) effect using the first 15.5 months of Planck satellite observations. As an addendum, we deliver an updated version of the PSZ1 catalogue, reporting the further confirmation of 86 Planck-discovered clusters. In total, the PSZ1 now contains 947 confirmed clusters, of which 214 were confirmed as newly discovered clusters through follow-up observations undertaken by the Planck Collaboration. The updated PSZ1 contains redshifts for 913 systems, of which 736 (~ 80.6%) are spectroscopic, and associated mass estimates derived from the Yz mass proxy. We also provide a new SZ quality flag for the remaining 280 candidates. This flag was derived from a novel artificial neural-network classification of the SZ signal. Based on this assessment, the purity of the updated PSZ1 catalogue is estimated to be 94%. In this release, we provide the full updated catalogue and an additional readme file with further information on the Planck SZ detections. The catalogue is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/581/A14
The British Film Catalogue: 1895-1970.

ERIC Educational Resources Information Center

Gifford, Denis

This reference book catalogues nearly every commercial film produced in Britain for public entertainment from 1895 to 1970. The entries are listed chronologically by year and month. Each entry is limited to a single film and contains a cross index code number, exhibition date, main title, length, color system, production company, distribution…
Twiddlenet: Metadata Tagging and Data Dissemination in Mobile Device Networks

DTIC Science & Technology

2007-09-01

hosting a distributed data dissemination application. Stated simply, there are a multitude of handheld devices on the market that can communicate in...content ( UGC ) across a network of distributed devices. This sharing is accomplished through the use of descriptive metadata tags that are assigned to a...file once it has been shared. These metadata files are uploaded to a centralized portal and arranged for efficient UGC location and searching
Catalogue Use by the Petherick Readers of the National Library of Australia

ERIC Educational Resources Information Center

Hider, Philip

2007-01-01

An online questionnaire survey was distributed amongst the Petherick Readers of the National Library of Australia, a user group of scholars and researchers. The survey asked questions about the readers' use and appreciation of the NLA catalogue. This group of users clearly appreciated the library catalogue and demonstrated that there are still…
phosphorus retention data and metadata

EPA Pesticide Factsheets

phosphorus retention in wetlands data and metadataThis dataset is associated with the following publication:Lane , C., and B. Autrey. Phosphorus retention of forested and emergent marsh depressional wetlands in differing land uses in Florida, USA. Wetlands Ecology and Management. Springer Science and Business Media B.V;Formerly Kluwer Academic Publishers B.V., GERMANY, 24(1): 45-60, (2016).
NCPP's Use of Standard Metadata to Promote Open and Transparent Climate Modeling

NASA Astrophysics Data System (ADS)

Treshansky, A.; Barsugli, J. J.; Guentchev, G.; Rood, R. B.; DeLuca, C.

2012-12-01

The National Climate Predictions and Projections (NCPP) Platform is developing comprehensive regional and local information about the evolving climate to inform decision making and adaptation planning. This includes both creating and providing tools to create metadata about the models and processes used to create its derived data products. NCPP is using the Common Information Model (CIM), an ontology developed by a broad set of international partners in climate research, as its metadata language. This use of a standard ensures interoperability within the climate community as well as permitting access to the ecosystem of tools and services emerging alongside the CIM. The CIM itself is divided into a general-purpose (UML & XML) schema which structures metadata documents, and a project or community-specific (XML) Controlled Vocabulary (CV) which constraints the content of metadata documents. NCPP has already modified the CIM Schema to accommodate downscaling models, simulations, and experiments. NCPP is currently developing a CV for use by the downscaling community. Incorporating downscaling into the CIM will lead to several benefits: easy access to the existing CIM Documents describing CMIP5 models and simulations that are being downscaled, access to software tools that have been developed in order to search, manipulate, and visualize CIM metadata, and coordination with national and international efforts such as ES-DOC that are working to make climate model descriptions and datasets interoperable. Providing detailed metadata descriptions which include the full provenance of derived data products will contribute to making that data (and, the models and processes which generated that data) more open and transparent to the user community.
New Tools to Document and Manage Data/Metadata: Example NGEE Arctic and ARM

NASA Astrophysics Data System (ADS)

Crow, M. C.; Devarakonda, R.; Killeffer, T.; Hook, L.; Boden, T.; Wullschleger, S.

2017-12-01

Tools used for documenting, archiving, cataloging, and searching data are critical pieces of informatics. This poster describes tools being used in several projects at Oak Ridge National Laboratory (ORNL), with a focus on the U.S. Department of Energy's Next Generation Ecosystem Experiment in the Arctic (NGEE Arctic) and Atmospheric Radiation Measurements (ARM) project, and their usage at different stages of the data lifecycle. The Online Metadata Editor (OME) is used for the documentation and archival stages while a Data Search tool supports indexing, cataloging, and searching. The NGEE Arctic OME Tool [1] provides a method by which researchers can upload their data and provide original metadata with each upload while adhering to standard metadata formats. The tool is built upon a Java SPRING framework to parse user input into, and from, XML output. Many aspects of the tool require use of a relational database including encrypted user-login, auto-fill functionality for predefined sites and plots, and file reference storage and sorting. The Data Search Tool conveniently displays each data record in a thumbnail containing the title, source, and date range, and features a quick view of the metadata associated with that record, as well as a direct link to the data. The search box incorporates autocomplete capabilities for search terms and sorted keyword filters are available on the side of the page, including a map for geo-searching. These tools are supported by the Mercury [2] consortium (funded by DOE, NASA, USGS, and ARM) and developed and managed at Oak Ridge National Laboratory. Mercury is a set of tools for collecting, searching, and retrieving metadata and data. Mercury collects metadata from contributing project servers, then indexes the metadata to make it searchable using Apache Solr, and provides access to retrieve it from the web page. Metadata standards that Mercury supports include: XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115.
System for Earth Sample Registration SESAR: Services for IGSN Registration and Sample Metadata Management

NASA Astrophysics Data System (ADS)

Chan, S.; Lehnert, K. A.; Coleman, R. J.

2011-12-01

SESAR, the System for Earth Sample Registration, is an online registry for physical samples collected for Earth and environmental studies. SESAR generates and administers the International Geo Sample Number IGSN, a unique identifier for samples that is dramatically advancing interoperability amongst information systems for sample-based data. SESAR was developed to provide the complete range of registry services, including definition of IGSN syntax and metadata profiles, registration and validation of name spaces requested by users, tools for users to submit and manage sample metadata, validation of submitted metadata, generation and validation of the unique identifiers, archiving of sample metadata, and public or private access to the sample metadata catalog. With the development of SESAR v3, we placed particular emphasis on creating enhanced tools that make metadata submission easier and more efficient for users, and that provide superior functionality for users to manage metadata of their samples in their private workspace MySESAR. For example, SESAR v3 includes a module where users can generate custom spreadsheet templates to enter metadata for their samples, then upload these templates online for sample registration. Once the content of the template is uploaded, it is displayed online in an editable grid format. Validation rules are executed in real-time on the grid data to ensure data integrity. Other new features of SESAR v3 include the capability to transfer ownership of samples to other SESAR users, the ability to upload and store images and other files in a sample metadata profile, and the tracking of changes to sample metadata profiles. In the next version of SESAR (v3.5), we will further improve the discovery, sharing, registration of samples. For example, we are developing a more comprehensive suite of web services that will allow discovery and registration access to SESAR from external systems. Both batch and individual registrations will be possible
C3: A Command-line Catalogue Cross-matching tool for modern astrophysical survey data

NASA Astrophysics Data System (ADS)

Riccio, Giuseppe; Brescia, Massimo; Cavuoti, Stefano; Mercurio, Amata; di Giorgio, Anna Maria; Molinari, Sergio

2017-06-01

In the current data-driven science era, it is needed that data analysis techniques has to quickly evolve to face with data whose dimensions has increased up to the Petabyte scale. In particular, being modern astrophysics based on multi-wavelength data organized into large catalogues, it is crucial that the astronomical catalog cross-matching methods, strongly dependant from the catalogues size, must ensure efficiency, reliability and scalability. Furthermore, multi-band data are archived and reduced in different ways, so that the resulting catalogues may differ each other in formats, resolution, data structure, etc, thus requiring the highest generality of cross-matching features. We present C 3 (Command-line Catalogue Cross-match), a multi-platform application designed to efficiently cross-match massive catalogues from modern surveys. Conceived as a stand-alone command-line process or a module within generic data reduction/analysis pipeline, it provides the maximum flexibility, in terms of portability, configuration, coordinates and cross-matching types, ensuring high performance capabilities by using a multi-core parallel processing paradigm and a sky partitioning algorithm.
Principles of metadata organization at the ENCODE data coordination center.

PubMed

Hong, Eurie L; Sloan, Cricket A; Chan, Esther T; Davidson, Jean M; Malladi, Venkat S; Strattan, J Seth; Hitz, Benjamin C; Gabdank, Idan; Narayanan, Aditi K; Ho, Marcus; Lee, Brian T; Rowe, Laurence D; Dreszer, Timothy R; Roe, Greg R; Podduturi, Nikhil R; Tanaka, Forrest; Hilton, Jason A; Cherry, J Michael

2016-01-01

The Encyclopedia of DNA Elements (ENCODE) Data Coordinating Center (DCC) is responsible for organizing, describing and providing access to the diverse data generated by the ENCODE project. The description of these data, known as metadata, includes the biological sample used as input, the protocols and assays performed on these samples, the data files generated from the results and the computational methods used to analyze the data. Here, we outline the principles and philosophy used to define the ENCODE metadata in order to create a metadata standard that can be applied to diverse assays and multiple genomic projects. In addition, we present how the data are validated and used by the ENCODE DCC in creating the ENCODE Portal (https://www.encodeproject.org/). Database URL: www.encodeproject.org. © The Author(s) 2016. Published by Oxford University Press.
Leveraging Metadata to Create Interactive Images... Today!

NASA Astrophysics Data System (ADS)

Hurt, Robert L.; Squires, G. K.; Llamas, J.; Rosenthal, C.; Brinkworth, C.; Fay, J.

2011-01-01

The image gallery for NASA's Spitzer Space Telescope has been newly rebuilt to fully support the Astronomy Visualization Metadata (AVM) standard to create a new user experience both on the website and in other applications. We encapsulate all the key descriptive information for a public image, including color representations and astronomical and sky coordinates and make it accessible in a user-friendly form on the website, but also embed the same metadata within the image files themselves. Thus, images downloaded from the site will carry with them all their descriptive information. Real-world benefits include display of general metadata when such images are imported into image editing software (e.g. Photoshop) or image catalog software (e.g. iPhoto). More advanced support in Microsoft's WorldWide Telescope can open a tagged image after it has been downloaded and display it in its correct sky position, allowing comparison with observations from other observatories. An increasing number of software developers are implementing AVM support in applications and an online image archive for tagged images is under development at the Spitzer Science Center. Tagging images following the AVM offers ever-increasing benefits to public-friendly imagery in all its standard forms (JPEG, TIFF, PNG). The AVM standard is one part of the Virtual Astronomy Multimedia Project (VAMP); http://www.communicatingastronomy.org
The ASAS-SN bright supernova catalogue - III. 2016

NASA Astrophysics Data System (ADS)

Holoien, T. W.-S.; Brown, J. S.; Stanek, K. Z.; Kochanek, C. S.; Shappee, B. J.; Prieto, J. L.; Dong, Subo; Brimacombe, J.; Bishop, D. W.; Bose, S.; Beacom, J. F.; Bersier, D.; Chen, Ping; Chomiuk, L.; Falco, E.; Godoy-Rivera, D.; Morrell, N.; Pojmanski, G.; Shields, J. V.; Strader, J.; Stritzinger, M. D.; Thompson, Todd A.; Woźniak, P. R.; Bock, G.; Cacella, P.; Conseil, E.; Cruz, I.; Fernandez, J. M.; Kiyota, S.; Koff, R. A.; Krannich, G.; Marples, P.; Masi, G.; Monard, L. A. G.; Nicholls, B.; Nicolas, J.; Post, R. S.; Stone, G.; Wiethoff, W. S.

2017-11-01

This catalogue summarizes information for all supernovae discovered by the All-Sky Automated Survey for SuperNovae (ASAS-SN) and all other bright (mpeak ≤ 17), spectroscopically confirmed supernovae discovered in 2016. We then gather the near-infrared through ultraviolet magnitudes of all host galaxies and the offsets of the supernovae from the centres of their hosts from public data bases. We illustrate the results using a sample that now totals 668 supernovae discovered since 2014 May 1, including the supernovae from our previous catalogues, with type distributions closely matching those of the ideal magnitude limited sample from Li et al. This is the third of a series of yearly papers on bright supernovae and their hosts from the ASAS-SN team.
Characteristics of health interventions: a systematic analysis of the Austrian Procedure Catalogue.

PubMed

Neururer, Sabrina B; Pfeiffer, Karl-Peter

2012-01-01

The Austrian Procedure Catalogue contains 1,500 codes for health interventions used for performance-oriented hospital financing in Austria. It offers a multiaxial taxonomy. The aim of this study is to identify characteristics of medical procedures. Therefore a definition analysis followed by a typological analysis was conducted. Search strings were generated out of code descriptions regarding the heart, large vessels and cardiovascular system. Their definitions were looked up in the Pschyrembel Clinical Dictionary and documented. Out of these definitions, types which represent characteristics of health interventions were abstracted. The three axes of the Austrian Procedure Catalogue were approved as well as new, relevant information identified. The results are the foundation of a further enhancement of the Austrian Procedure Catalogue.
Making Information Visible, Accessible, and Understandable: Meta-Data and Registries

DTIC Science & Technology

2007-07-01

the data created, the length of play time, album name, and the genre. Without resource metadata, portable digital music players would not be so...notion of a catalog card in a library. An example of metadata is the description of a music file specifying the creator, the artist that performed the song...describe struc- ture and formatting which are critical to interoperability and the management of databases. Going back to the portable music player example

The Sixth Catalogue of galactic Wolf-Rayet stars, their past and present

NASA Technical Reports Server (NTRS)

Van Der Hucht, K. A.; Conti, P. S.; Lundstrom, I.; Stenholm, B.

1981-01-01

This paper presents the Sixth Catalogue of galactic Wolf-Rayet stars (Pop. I), a short history on the five earlier WR catalogues, improved spectral classification, finding charts, a discussion on related objects, and a review of the current status of Wolf-Rayet star research. The appendix presents a bibliography on most of the Wolf-Rayet literature published since 1867.
VizieR Online Data Catalog: The Super-CLASS GMRT catalogue - SCG (Riseley+, 2016)

NASA Astrophysics Data System (ADS)

Riseley, C. J.; Scaife, A. M. M.; Hales, C. A.; Harrison, I.; Birkinshaw, M.; Battye, R. A.; Beswick, R. J.; Brown, M. L.; Casey, C. M.; Chapman, S. C.; Demetroullas, C.; Hung, C.-L.; Jackson, N. J.; Muxlow, T.; Watson, B.

2016-06-01

The Super-CLASS GMRT (SCG) catalogue is the low-frequency counterpart of the Super-Cluster Assisted Shear Survey. It is a survey at 13-arcsec resolution, with a limiting 5σ flux density of 170uJy. The catalogue comprises 3257 sources. (1 data file).
Increasing the international visibility of research data by a joint metadata schema

NASA Astrophysics Data System (ADS)

Svoboda, Nikolai; Zoarder, Muquit; Gärtner, Philipp; Hoffmann, Carsten; Heinrich, Uwe

2017-04-01

The BonaRes Project ("Soil as a sustainable resource for the bioeconomy") was launched in 2015 to promote sustainable soil management and to avoid fragmentation of efforts (Wollschläger et al., 2016). For this purpose, an IT infrastructure is being developed to upload, manage, store, and provide research data and its associated metadata. The research data provided by the BonaRes data centre are, in principle, not subject to any restrictions on reuse. For all research data considerable standardized metadata are the key enablers for the effective use of these data. Providing proper metadata is often viewed as an extra burden with further work and resources consumed. In our lecture we underline the benefits of structured and interoperable metadata like: accessibility of data, discovery of data, interpretation of data, linking data and several more and we counter these advantages with the effort of time, personnel and further costs. Building on this, we describe the framework of metadata in BonaRes combining the standards of OGC for description, visualization, exchange and discovery of geodata as well as the schema of DataCite for the publication and citation of this research data. This enables the generation of a DOI, a unique identifier that provides a permanent link to the citable research data. By using OGC standards, data and metadata become interoperable with numerous research data provided via INSPIRE. It enables further services like CSW for harvesting WMS for visualization and WFS for downloading. We explain the mandatory fields that result from our approach and we give a general overview about our metadata architecture implementation. Literature: Wollschläger, U; Helming, K.; Heinrich, U.; Bartke, S.; Kögel-Knabner, I.; Russell, D.; Eberhardt, E. & Vogel, H.-J.: The BonaRes Centre - A virtual institute for soil research in the context of a sustainable bio-economy. Geophysical Research Abstracts, Vol. 18, EGU2016-9087, 2016.
Predicting age groups of Twitter users based on language and metadata features

PubMed Central

Morgan-Lopez, Antonio A.; Chew, Robert F.; Ruddle, Paul

2017-01-01

Health organizations are increasingly using social media, such as Twitter, to disseminate health messages to target audiences. Determining the extent to which the target audience (e.g., age groups) was reached is critical to evaluating the impact of social media education campaigns. The main objective of this study was to examine the separate and joint predictive validity of linguistic and metadata features in predicting the age of Twitter users. We created a labeled dataset of Twitter users across different age groups (youth, young adults, adults) by collecting publicly available birthday announcement tweets using the Twitter Search application programming interface. We manually reviewed results and, for each age-labeled handle, collected the 200 most recent publicly available tweets and user handles’ metadata. The labeled data were split into training and test datasets. We created separate models to examine the predictive validity of language features only, metadata features only, language and metadata features, and words/phrases from another age-validated dataset. We estimated accuracy, precision, recall, and F1 metrics for each model. An L1-regularized logistic regression model was conducted for each age group, and predicted probabilities between the training and test sets were compared for each age group. Cohen’s d effect sizes were calculated to examine the relative importance of significant features. Models containing both Tweet language features and metadata features performed the best (74% precision, 74% recall, 74% F1) while the model containing only Twitter metadata features were least accurate (58% precision, 60% recall, and 57% F1 score). Top predictive features included use of terms such as “school” for youth and “college” for young adults. Overall, it was more challenging to predict older adults accurately. These results suggest that examining linguistic and Twitter metadata features to predict youth and young adult Twitter users may be
ExoData: A Python package to handle large exoplanet catalogue data

NASA Astrophysics Data System (ADS)

Varley, Ryan

2016-10-01

Exoplanet science often involves using the system parameters of real exoplanets for tasks such as simulations, fitting routines, and target selection for proposals. Several exoplanet catalogues are already well established but often lack a version history and code friendly interfaces. Software that bridges the barrier between the catalogues and code enables users to improve the specific repeatability of results by facilitating the retrieval of exact system parameters used in articles results along with unifying the equations and software used. As exoplanet science moves towards large data, gone are the days where researchers can recall the current population from memory. An interface able to query the population now becomes invaluable for target selection and population analysis. ExoData is a Python interface and exploratory analysis tool for the Open Exoplanet Catalogue. It allows the loading of exoplanet systems into Python as objects (Planet, Star, Binary, etc.) from which common orbital and system equations can be calculated and measured parameters retrieved. This allows researchers to use tested code of the common equations they require (with units) and provides a large science input catalogue of planets for easy plotting and use in research. Advanced querying of targets is possible using the database and Python programming language. ExoData is also able to parse spectral types and fill in missing parameters according to programmable specifications and equations. Examples of use cases are integration of equations into data reduction pipelines, selecting planets for observing proposals and as an input catalogue to large scale simulation and analysis of planets. ExoData is a Python package available freely on GitHub.
A new catalogue of Strömgren-Crawford uvbyβ photometry

NASA Astrophysics Data System (ADS)

Paunzen, E.

2015-08-01

Context. The uvbyβ photometric system is widely used for the study of various Galactic and extragalactic objects. It measures the colour due to temperature differences, the Balmer discontinuity, and blanketing absorption due to metals. Aims: A new all-sky catalogue of all available uvbyβ measurements from the literature was generated. Methods: The data for the individual stars were cross-checked on the basis of the Tycho-2 catalogue. This catalogue includes very precise celestial coordinates, but is magnitude and spatial resolution limited. However, the loss of objects is only marginal and is compensated for by the gain of homogeneity. Results: In total, 298 639 measurements of 60 668 stars were used to derive unweighted mean indices and their errors. Photoelectric and CCD observations were treated in the same way. Conclusions: The presented data set can be used for various applications such as new calibrations of astrophysical parameters, the standardization of new observations, and as additional information for ongoing and forthcoming all-sky surveys. The catalogue is only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (ftp://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/580/A23 http://vizier.u-strasbg.fr/viz-bin/VizieR
Power spectrum estimation from peculiar velocity catalogues

NASA Astrophysics Data System (ADS)

Macaulay, E.; Feldman, H. A.; Ferreira, P. G.; Jaffe, A. H.; Agarwal, S.; Hudson, M. J.; Watkins, R.

2012-09-01

The peculiar velocities of galaxies are an inherently valuable cosmological probe, providing an unbiased estimate of the distribution of matter on scales much larger than the depth of the survey. Much research interest has been motivated by the high dipole moment of our local peculiar velocity field, which suggests a large-scale excess in the matter power spectrum and can appear to be in some tension with the Λ cold dark matter (ΛCDM) model. We use a composite catalogue of 4537 peculiar velocity measurements with a characteristic depth of 33 h-1 Mpc to estimate the matter power spectrum. We compare the constraints with this method, directly studying the full peculiar velocity catalogue, to results by Macaulay et al., studying minimum variance moments of the velocity field, as calculated by Feldman, Watkins & Hudson. We find good agreement with the ΛCDM model on scales of k > 0.01 h Mpc-1. We find an excess of power on scales of k < 0.01 h Mpc-1 with a 1σ uncertainty which includes the ΛCDM model. We find that the uncertainty in excess at these scales is larger than an alternative result studying only moments of the velocity field, which is due to the minimum variance weights used to calculate the moments. At small scales, we are able to clearly discriminate between linear and non-linear clustering in simulated peculiar velocity catalogues and find some evidence (although less clear) for linear clustering in the real peculiar velocity data.
Informal information for web-based engineering catalogues

NASA Astrophysics Data System (ADS)

Allen, Richard D.; Culley, Stephen J.; Hicks, Ben J.

2001-10-01

Success is highly dependent on the ability of a company to efficiently produce optimal designs. In order to achieve this companies must minimize time to market and possess the ability to make fully informed decisions at the early phase of the design process. Such decisions may include the choice of component and suppliers, as well as cost and maintenance considerations. Computer modeling and electronic catalogues are becoming the preferred medium for the selection and design of mechanical components. In utilizing these techniques, the designer demands the capability to identify, evaluate and select mechanical components both quantitatively and qualitatively. Quantitative decisions generally encompass performance data included in the formal catalogue representation. It is in the area of qualitative decisions that the use of what the authors call 'Informal Information' is of crucial importance. Thus, 'Informal Information' must often be incorporated into the selection process and selection systems. This would enable more informed decisions to be made quicker, without the need for information retrieval via discussion with colleagues in the design environment. This paper provides an overview of the use of electronic information in the design of mechanical systems, including a discussion of limitations of current technology. The importance of Informal Information is discussed and the requirements for association with web based electronic catalogues are developed. This system is based on a flexible XML schema and enables the storage, classification and recall of Informal Information packets. Furthermore, a strategy for the inclusion of Informal Information is proposed, and an example case is used to illustrate the benefits.
The compilation of the instrumental seismic catalogue of Italy: 1975-1984

NASA Astrophysics Data System (ADS)

Giardini, D.; Velonà, M. A.; Boschi, E.

1992-12-01

We compile a homogeneous and complete catalogue of the seismicity of the Italian region for 1975-1984, the period marking the transition from standard analogue seismometry to the new digital era. The work is developed in three phases: (1) the creation of a uniform digital databank of all seismic station readings, unifying the database available at the Istituto Nazionale di Geofisica with the catalogue of the International Seismological Centre; (2) the preparation of numerical procedures for automatic association of arrival data and for hypocentre location, using arrivals from local, regional and teleseismic stations in a spherical geometry; (3) the introduction of lateral heterogeneity by calibrating regional travel-time curves and station corrections. The first two phases have been completed, providing a new instrumental catalogue obtained using a spherical Earth model; the third phase is presented here in a preliminary stage.
Manifestations of Metadata: From Alexandria to the Web--Old is New Again

ERIC Educational Resources Information Center

Kennedy, Patricia

2008-01-01

This paper is a discussion of the use of metadata, in its various manifestations, to access information. Information management standards are discussed. The connection between the ancient world and the modern world is highlighted. Individual perspectives are paramount in fulfilling information seeking. Metadata is interpreted and reflected upon in…
A Metadata Standard for Hydroinformatic Data Conforming to International Standards

NASA Astrophysics Data System (ADS)

Notay, Vikram; Carstens, Georg; Lehfeldt, Rainer

2017-04-01

The affordable availability of computing power and digital storage has been a boon for the scientific community. The hydroinformatics community has also benefitted from the so-called digital revolution, which has enabled the tackling of more and more complex physical phenomena using hydroinformatic models, instruments, sensors, etc. With models getting more and more complex, computational domains getting larger and the resolution of computational grids and measurement data getting finer, a large amount of data is generated and consumed in any hydroinformatics related project. The ubiquitous availability of internet also contributes to this phenomenon with data being collected through sensor networks connected to telecommunications networks and the internet long before the term Internet of Things existed. Although generally good, this exponential increase in the number of available datasets gives rise to the need to describe this data in a standardised way to not only be able to get a quick overview about the data but to also facilitate interoperability of data from different sources. The Federal Waterways Engineering and Research Institute (BAW) is a federal authority of the German Federal Ministry of Transport and Digital Infrastructure. BAW acts as a consultant for the safe and efficient operation of the German waterways. As part of its consultation role, BAW operates a number of physical and numerical models for sections of inland and marine waterways. In order to uniformly describe the data produced and consumed by these models throughout BAW and to ensure interoperability with other federal and state institutes on the one hand and with EU countries on the other, a metadata profile for hydroinformatic data has been developed at BAW. The metadata profile is composed in its entirety using the ISO 19115 international standard for metadata related to geographic information. Due to the widespread use of the ISO 19115 standard in the existing geodata infrastructure
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.

PubMed

Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin

2018-02-01

More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA space geodesy program: Catalogue of site information

NASA Technical Reports Server (NTRS)

Bryant, M. A.; Noll, C. E.

1993-01-01

This is the first edition of the NASA Space Geodesy Program: Catalogue of Site Information. This catalogue supersedes all previous versions of the Crustal Dynamics Project: Catalogue of Site Information, last published in May 1989. This document is prepared under the direction of the Space Geodesy and Altimetry Projects Office (SGAPO), Code 920.1, Goddard Space Flight Center. SGAPO has assumed the responsibilities of the Crustal Dynamics Project, which officially ended December 31, 1991. The catalog contains information on all NASA supported sites as well as sites from cooperating international partners. This catalog is designed to provde descriptions and occupation histories of high-accuracy geodetic measuring sites employing space-related techniques. The emphasis of the catalog has been in the past, and continues to be with this edition, station information for facilities and remote locations utilizing the Satellite Laser Ranging (SLR), Lunar Laser Ranging (LLR), and Very Long Baseline Interferometry (VLBI) techniques. With the proliferation of high-quality Global Positioning System (GPS) receivers and Doppler Orbitography and Radiopositioning Integrated by Satellite (DORIS) transponders, many co-located at established SLR and VLBI observatories, the requirement for accurate station and localized survey information for an ever broadening base of scientists and engineers has been recognized. It is our objective to provide accurate station information to scientific groups interested in these facilities.
Catalogue of Tenebrionidae (Coleoptera) of North America

USDA-ARS?s Scientific Manuscript database

The catalogue presented herein includes all scientific names (family-, genus- and species-group) for the darkling beetle (Coleoptera: Tenebrionidae) fauna occurring in North America. Data on extant, subfossil and fossil taxa are given. For each name the author, year and page number are provided with...
Crustal Dynamics Project: Catalogue of site information

NASA Technical Reports Server (NTRS)

1985-01-01

This document represents a catalogue of site information for the Crustal Dynamics Project. It contains information and descriptions of those sites used by the Project as observing stations for making the precise geodetic measurements useful for studies of the Earth's crustal movements and deformation.
Effective use of metadata in the integration and analysis of multi-dimensional optical data

NASA Astrophysics Data System (ADS)

Pastorello, G. Z.; Gamon, J. A.

2012-12-01

Data discovery and integration relies on adequate metadata. However, creating and maintaining metadata is time consuming and often poorly addressed or avoided altogether, leading to problems in later data analysis and exchange. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Vegetation monitoring using in-situ and remote optical sensing is an example of such a domain. In this area, data are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized. Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations, field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle), data processing choices (e.g., methods for gap filling, filtering and aggregation of data), quality assurance, and documentation of data sources, ownership and licensing. Each of these aspects can be important as metadata for search and discovery, but they can also be used as key data fields in their own right. If each of these aspects is also understood as an "extra dimension," it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include selecting data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., adaptive processing for data collected under different sky conditions). More interesting scenarios involve guided navigation and visualization of data sets based on these extra dimensions, as well as partitioning data sets to highlight relevant subsets to be made available for exchange. The
An asynchronous traversal engine for graph-based rich metadata management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Carns, Philip; Ross, Robert B.

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal
An asynchronous traversal engine for graph-based rich metadata management

DOE PAGES

Dai, Dong; Carns, Philip; Ross, Robert B.; ...

2016-06-23

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal
Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR.

PubMed

Ulrich, Hannes; Kock, Ann-Kristin; Duhm-Harbeck, Petra; Habermann, Jens K; Ingenerf, Josef

2016-01-01

Unreconciled data structures and formats are a common obstacle to the urgently required sharing and reuse of data within healthcare and medical research. Within the North German Tumor Bank of Colorectal Cancer, clinical and sample data, based on a harmonized data set, is collected and can be pooled by using a hospital-integrated Research Data Management System supporting biobank and study management. Adding further partners who are not using the core data set requires manual adaptations and mapping of data elements. Facing this manual intervention and focusing the reuse of heterogeneous healthcare instance data (value level) and data elements (metadata level), a metadata repository has been developed. The metadata repository is an ISO 11179-3 conformant server application built for annotating and mediating data elements. The implemented architecture includes the translation of metadata information about data elements into the FHIR standard using the FHIR Data Element resource with the ISO 11179 Data Element Extensions. The FHIR-based processing allows exchange of data elements with clinical and research IT systems as well as with other metadata systems. With increasingly annotated and harmonized data elements, data quality and integration can be improved for successfully enabling data analytics and decision support.
AKARI North Ecliptic Pole Deep Survey. Revision of the catalogue via a new image analysis

NASA Astrophysics Data System (ADS)

Murata, K.; Matsuhara, H.; Wada, T.; Arimatsu, K.; Oi, N.; Takagi, T.; Oyabu, S.; Goto, T.; Ohyama, Y.; Malkan, M.; Pearson, C.; Małek, K.; Solarz, A.

2013-11-01

Context. We present the revised near- to mid-infrared catalogue of the AKARI North Ecliptic Pole deep survey. The survey has the unique advantage of continuous filter coverage from 2 to 24 μm over nine photometric bands, but the initial version of the survey catalogue leaves room for improvement in the image analysis stage; the original images are strongly contaminated by the behaviour of the detector and the optical system. Aims: The purpose of this study is to devise new image analysis methods and to improve the detection limit and reliability of the source extraction. Methods: We removed the scattered light and stray light from the Earth limb, and corrected for artificial patterns in the images by creating appropriate templates. We also removed any artificial sources due to bright sources by using their properties or masked them out visually. In addition, for the mid-infrared source extraction, we created detection images by stacking all six bands. This reduced the sky noise and enabled us to detect fainter sources more reliably. For the near-infrared source catalogue, we considered only objects with counterparts from ground-based catalogues to avoid fake sources. For our ground-based catalogues, we used catalogues based on the CFHT/MegaCam z' band, CFHT/WIRCam Ks band and Subaru/Scam z' band. Objects with multiple counterparts were all listed in the catalogue with a merged flag for the AKARI flux. Results: The detection limits of all mid-infrared bands were improved by ~20%, and the total number of detected objects was increased by ~2000 compared with the previous version of the catalogue; it now has 9560 objects. The 5σ detection limits in our catalogue are 11, 9, 10, 30, 34, 57, 87, 93, and 256 μJy in the N2, N3, N4, S7, S9W, S11, L15, L18W, and L24 bands, respectively. The astrometric accuracies of these band detections are 0.48, 0.52, 0.55, 0.99, 0.95, 1.1, 1.2, 1.3, and 1.6 arcsec, respectively. The false-detection rate of all nine bands was decreased

Serving Fisheries and Ocean Metadata to Communities Around the World

NASA Technical Reports Server (NTRS)

Meaux, Melanie F.

2007-01-01

NASA's Global Change Master Directory (GCMD) assists the oceanographic community in the discovery, access, and sharing of scientific data by serving on-line fisheries and ocean metadata to users around the globe. As of January 2006, the directory holds more than 16,300 Earth Science data descriptions and over 1,300 services descriptions. Of these, nearly 4,000 unique ocean-related metadata records are available to the public, with many having direct links to the data. In 2005, the GCMD averaged over 5 million hits a month, with nearly a half million unique hosts for the year. Through the GCMD portal (http://gcmd.nasa.gov/), users can search vast and growing quantities of data and services using controlled keywords, free-text searches, or a combination of both. Users may now refine a search based on topic, location, instrument, platform, project, data center, spatial and temporal coverage, and data resolution for selected datasets. The directory also offers data holders a means to advertise and search their data through customized portals, which are subset views of the directory. The discovery metadata standard used is the Directory Interchange Format (DIF), adopted in 1988. This format has evolved to accommodate other national and international standards such as FGDC and IS019115. Users can submit metadata through easy-to-use online and offline authoring tools. The directory, which also serves as the International Directory Network (IDN), has been providing its services and sharing its experience and knowledge of metadata at the international, national, regional, and local level for many years. Active partners include the Committee on Earth Observation Satellites (CEOS), federal agencies (such as NASA, NOAA, and USGS), international agencies (such as IOC/IODE, UN, and JAXA) and organizations (such as ESIP, IOOS/DMAC, GOSIC, GLOBEC, OBIS, and GoMODP).
Cataloguing E-Books in UK Higher Education Libraries: Report of a Survey

ERIC Educational Resources Information Center

Belanger, Jacqueline

2007-01-01

Purpose: The purpose of this paper is to discuss the results of a 2006 survey of UK Higher Education OPACs in order to provide a snapshot of cataloguing practices for e-books. Design/methodology/approach: The OPACs of 30 UK HE libraries were examined in July/August 2006 to determine which e-books were catalogued, and the level of cataloguing…
VizieR Online Data Catalog: RASS-6dFGS catalogue (Mahony+, 2010)

NASA Astrophysics Data System (ADS)

Mahony, E. K.; Croom, S. M.; Boyle, B. J.; Edge, A. C.; Mauch, T.; Sadler, E. M.

2014-09-01

Objects were selected such that the dominant source of X-ray emission originates from an AGN. The target list was selected from the southern sources (δ<=0°) of the RBSC, a total of 9578 sources. Sources were then checked for optical identifications via a visual inspection process using Digitized Sky Survey (DSS) images. The majority of the optical positions were taken from the United States Naval Observatory (USNO) data base, with the remainder taken from either the Automated Plate Measuring (APM) or DSS catalogues. Positions from these latter catalogues were used when the USNO appeared to give an incorrect position according to the DSS images. Optical magnitudes were taken from the USNO-A2.0 catalogue (Monet 1998, Cat. I/252). (2 data files).
Automatic meta-data collection of STP observation data

NASA Astrophysics Data System (ADS)

Ishikura, S.; Kimura, E.; Murata, K.; Kubo, T.; Shinohara, I.

2006-12-01

For the geo-science and the STP (Solar-Terrestrial Physics) studies, various observations have been done by satellites and ground-based observatories up to now. These data are saved and managed at many organizations, but no common procedure and rule to provide and/or share these data files. Researchers have felt difficulty in searching and analyzing such different types of data distributed over the Internet. To support such cross-over analyses of observation data, we have developed the STARS (Solar-Terrestrial data Analysis and Reference System). The STARS consists of client application (STARS-app), the meta-database (STARS- DB), the portal Web service (STARS-WS) and the download agent Web service (STARS DLAgent-WS). The STARS-DB includes directory information, access permission, protocol information to retrieve data files, hierarchy information of mission/team/data and user information. Users of the STARS are able to download observation data files without knowing locations of the files by using the STARS-DB. We have implemented the Portal-WS to retrieve meta-data from the meta-database. One reason we use the Web service is to overcome a variety of firewall restrictions which is getting stricter in recent years. Now it is difficult for the STARS client application to access to the STARS-DB by sending SQL query to obtain meta- data from the STARS-DB. Using the Web service, we succeeded in placing the STARS-DB behind the Portal- WS and prevent from exposing it on the Internet. The STARS accesses to the Portal-WS by sending the SOAP (Simple Object Access Protocol) request over HTTP. Meta-data is received as a SOAP Response. The STARS DLAgent-WS provides clients with data files downloaded from data sites. The data files are provided with a variety of protocols (e.g., FTP, HTTP, FTPS and SFTP). These protocols are individually selected at each site. The clients send a SOAP request with download request messages and receive observation data files as a SOAP Response with
Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata.

PubMed

Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

2018-01-16

Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included.
Virtual Environments for Visualizing Structural Health Monitoring Sensor Networks, Data, and Metadata

PubMed Central

Napolitano, Rebecca; Blyth, Anna; Glisic, Branko

2018-01-01

Visualization of sensor networks, data, and metadata is becoming one of the most pivotal aspects of the structural health monitoring (SHM) process. Without the ability to communicate efficiently and effectively between disparate groups working on a project, an SHM system can be underused, misunderstood, or even abandoned. For this reason, this work seeks to evaluate visualization techniques in the field, identify flaws in current practices, and devise a new method for visualizing and accessing SHM data and metadata in 3D. More precisely, the work presented here reflects a method and digital workflow for integrating SHM sensor networks, data, and metadata into a virtual reality environment by combining spherical imaging and informational modeling. Both intuitive and interactive, this method fosters communication on a project enabling diverse practitioners of SHM to efficiently consult and use the sensor networks, data, and metadata. The method is presented through its implementation on a case study, Streicker Bridge at Princeton University campus. To illustrate the efficiency of the new method, the time and data file size were compared to other potential methods used for visualizing and accessing SHM sensor networks, data, and metadata in 3D. Additionally, feedback from civil engineering students familiar with SHM is used for validation. Recommendations on how different groups working together on an SHM project can create SHM virtual environment and convey data to proper audiences, are also included. PMID:29337877
Philosophy and updating of the asteroid photometric catalogue

NASA Technical Reports Server (NTRS)

Magnusson, Per; Barucci, M. Antonietta; Capria, M. T.; Dahlgren, Mats; Fulchignoni, Marcello; Lagerkvist, C. I.

1992-01-01

The Asteroid Photometric Catalogue now contains photometric lightcurves for 584 asteroids. We discuss some of the guiding principles behind it. This concerns both observers who offer input to it and users of the product.
The new OGC Catalogue Services 3.0 specification - status of work

NASA Astrophysics Data System (ADS)

Bigagli, Lorenzo; Voges, Uwe

2013-04-01

We report on the work of the Open Geospatial Consortium Catalogue Services 3.0 Standards Working Group (OGC Cat 3.0 SWG for short), started in March 2008, with the purpose to process change requests on the Catalogue Services 2.0.2 Implementation Specification (OGC 07-006r1) and produce a revised version thereof, comprising the related XML schemas and abstract test suite. The work was initially intended as a minor revision (version 2.1), but later retargeted as a major update of the standard and rescheduled (the anticipated roadmap ended in 2008). The target audience of Catalogue Services 3.0 includes: • Implementors of catalogue services solutions. • Designers and developers of catalogue services profiles. • Providers/users of catalogue services. The two main general areas of enhancement included: restructuring the specification document according to the OGC standard for modular specifications (OGC 08-131r3, also known as Core and Extension model); incorporating the current mass-market technologies for discovery on the Web, namely OpenSearch. The document was initially split into four parts: the general model and the three protocol bindings HTTP, Z39.50, and CORBA. The CORBA binding, which was very rarely implemented, and the Z39.50 binding have later been dropped. Parts of the Z39.50 binding, namely Search/Retrieve via URL (SRU; same semantics as Z39.50, but stateless), have been provided as a discussion paper (OGC 12-082) for possibly developing a future SRU profile. The Catalogue Services 3.0 specification is structured as follows: • Part 1: General Model (Core) • Part 2: HTTP Protocol Binding (CSW) In CSW, the GET/KVP encoding is mandatory. The POST/XML encoding is optional. SOAP is supported as a special case of the POST/XML encoding. OpenSearch must always be supported, regardless of the implemented profiles, along with the OpenSearch Geospatial and Temporal Extensions (OGC 10-032r2). The latter specifies spatial (e.g. point-plus-radius, bounding
100+ years of instrumental seismology: the example of the ISC-GEM Global Earthquake Instrumental Catalogue

NASA Astrophysics Data System (ADS)

Storchak, Dmitry; Di Giacomo, Domenico

2015-04-01

Systematic seismological observations of earthquakes using seismic instruments on a global scale began more than 100 years ago. Since then seismologists made many discoveries about the Earth interior and the physics of the earthquakes, also thanks to major developments in the seismic instrumentation deployed around the world. Besides, since the establishment of the first global networks (Milne and Jesuit networks), seismologists around the world stored and exchanged the results of routine observations (e.g., picking of arrival times, amplitude-period measurements, etc.) or more sophisticated analyses (e.g., moment tensor inversion) in seismological bulletins/catalogues. With a project funded by the GEM Foundation (www.globalquakemodel.org), the ISC and the Team of International Experts released a new global earthquake catalogue, the ISC-GEM Global Instrumental Earthquake Catalogue (1900 2009) (www.isc.ac.uk/iscgem/index.php), which, differently from previous global seismic catalogues, has the unique feature of covering the entire period of instrumental seismology with locations and magnitude re-assessed using modern approaches for the global earthquakes selected for processing (in the current version approximately 21,000). During the 110 years covered by the ISC-GEM catalogue many seismological developments occurred in terms of instrumentation, seismological practice and knowledge of the physics of the earthquakes. In this contribution we give a brief overview of the major milestones characterizing the last 100+ years of instrumental seismology that were relevant for the production of the ISC-GEM catalogue and the major challenges we faced to obtain a catalogue as homogenous as possible.
VizieR Online Data Catalog: SDSS-based Polar Ring Catalogue (Moiseev+, 2011)

NASA Astrophysics Data System (ADS)

Moiseev, A. V.; Smirnova, K. I.; Smirnova, A. A.; Reshetnikov, V. P.

2012-06-01

Galaxies with polar rings (PRGs) are a unique class of extragalactic objects. Using these, we can investigate a wide range of problems, linked to the formation and evolution of galaxies, and we can study the properties of their dark haloes. The progress that has been made in the study of PRGs has been constrained by the small number of known objects of this type. The Polar Ring Catalogue (PRC) by Whitmore et al. (1990AJ....100.1489W) and their photographic atlas of PRGs and related objects includes 157 galaxies. At present, there are only about two dozen kinematically confirmed galaxies in this PRG class, mostly from the PRC. We present a new catalogue of PRGs, supplementing the PRC and significantly increasing the number of known candidate PRGs. The catalogue is based on the results of the original Galaxy Zoo project. Within this project, volunteers performed visual classifications of nearly a million galaxies from the Sloan Digital Sky Survey (SDSS). Based on the preliminary classifications of the Galaxy Zoo, we viewed more than 40000 images of the SDSS and selected 275 galaxies to include in our catalogue. (1 data file).
High-performance metadata indexing and search in petascale data storage systems

NASA Astrophysics Data System (ADS)

Leung, A. W.; Shao, M.; Bisson, T.; Pasupathy, S.; Miller, E. L.

2008-07-01

Large-scale storage systems used for scientific applications can store petabytes of data and billions of files, making the organization and management of data in these systems a difficult, time-consuming task. The ability to search file metadata in a storage system can address this problem by allowing scientists to quickly navigate experiment data and code while allowing storage administrators to gather the information they need to properly manage the system. In this paper, we present Spyglass, a file metadata search system that achieves scalability by exploiting storage system properties, providing the scalability that existing file metadata search tools lack. In doing so, Spyglass can achieve search performance up to several thousand times faster than existing database solutions. We show that Spyglass enables important functionality that can aid data management for scientists and storage administrators.
Utilizing Linked Open Data Sources for Automatic Generation of Semantic Metadata

NASA Astrophysics Data System (ADS)

Nummiaho, Antti; Vainikainen, Sari; Melin, Magnus

In this paper we present an application that can be used to automatically generate semantic metadata for tags given as simple keywords. The application that we have implemented in Java programming language creates the semantic metadata by linking the tags to concepts in different semantic knowledge bases (CrunchBase, DBpedia, Freebase, KOKO, Opencyc, Umbel and/or WordNet). The steps that our application takes in doing so include detecting possible languages, finding spelling suggestions and finding meanings from amongst the proper nouns and common nouns separately. Currently, our application supports English, Finnish and Swedish words, but other languages could be included easily if the required lexical tools (spellcheckers, etc.) are available. The created semantic metadata can be of great use in, e.g., finding and combining similar contents, creating recommendations and targeting advertisements.
Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Hook, Leslie A; Killeffer, Terri S

The Online Metadata Editor (OME) is a web-based tool to help document scientific data in a well-structured, popular scientific metadata format. In this paper, we will discuss the newest tool that Oak Ridge National Laboratory (ORNL) has developed to generate, edit, and manage metadata and how it is helping data-intensive science centers and projects, such as the U.S. Department of Energy s Next Generation Ecosystem Experiments (NGEE) in the Arctic to prepare metadata and make their big data produce big science and lead to new discoveries.
The XMM-Newton serendipitous survey. VII. The third XMM-Newton serendipitous source catalogue

NASA Astrophysics Data System (ADS)

Rosen, S. R.; Webb, N. A.; Watson, M. G.; Ballet, J.; Barret, D.; Braito, V.; Carrera, F. J.; Ceballos, M. T.; Coriat, M.; Della Ceca, R.; Denkinson, G.; Esquej, P.; Farrell, S. A.; Freyberg, M.; Grisé, F.; Guillout, P.; Heil, L.; Koliopanos, F.; Law-Green, D.; Lamer, G.; Lin, D.; Martino, R.; Michel, L.; Motch, C.; Nebot Gomez-Moran, A.; Page, C. G.; Page, K.; Page, M.; Pakull, M. W.; Pye, J.; Read, A.; Rodriguez, P.; Sakano, M.; Saxton, R.; Schwope, A.; Scott, A. E.; Sturm, R.; Traulsen, I.; Yershov, V.; Zolotukhin, I.

2016-05-01

Context. Thanks to the large collecting area (3 ×~1500 cm2 at 1.5 keV) and wide field of view (30' across in full field mode) of the X-ray cameras on board the European Space Agency X-ray observatory XMM-Newton, each individual pointing can result in the detection of up to several hundred X-ray sources, most of which are newly discovered objects. Since XMM-Newton has now been in orbit for more than 15 yr, hundreds of thousands of sources have been detected. Aims: Recently, many improvements in the XMM-Newton data reduction algorithms have been made. These include enhanced source characterisation and reduced spurious source detections, refined astrometric precision of sources, greater net sensitivity for source detection, and the extraction of spectra and time series for fainter sources, both with better signal-to-noise. Thanks to these enhancements, the quality of the catalogue products has been much improved over earlier catalogues. Furthermore, almost 50% more observations are in the public domain compared to 2XMMi-DR3, allowing the XMM-Newton Survey Science Centre to produce a much larger and better quality X-ray source catalogue. Methods: The XMM-Newton Survey Science Centre has developed a pipeline to reduce the XMM-Newton data automatically. Using the latest version of this pipeline, along with better calibration, a new version of the catalogue has been produced, using XMM-Newton X-ray observations made public on or before 2013 December 31. Manual screening of all of the X-ray detections ensures the highest data quality. This catalogue is known as 3XMM. Results: In the latest release of the 3XMM catalogue, 3XMM-DR5, there are 565 962 X-ray detections comprising 396 910 unique X-ray sources. Spectra and lightcurves are provided for the 133 000 brightest sources. For all detections, the positions on the sky, a measure of the quality of the detection, and an evaluation of the X-ray variability is provided, along with the fluxes and count rates in 7 X-ray energy
Planck 2015 results: XXVI. The Second Planck Catalogue of Compact Sources

DOE PAGES

Ade, P. A. R.; Aghanim, N.; Argüeso, F.; ...

2016-09-20

The Second Planck Catalogue of Compact Sources is a list of discrete objects detected in single-frequency maps from the full duration of the Planck mission and supersedes previous versions. Also, it consists of compact sources, both Galactic and extragalactic, detected over the entire sky. Compact sources detected in the lower frequency channels are assigned to the PCCS2, while at higher frequencies they are assigned to one of two subcatalogues, the PCCS2 or PCCS2E, depending on their location on the sky. The first of these (PCCS2) covers most of the sky and allows the user to produce subsamples at higher reliabilitiesmore » than the target 80% integral reliability of the catalogue. The second (PCCS2E) contains sources detected in sky regions where the diffuse emission makes it difficult to quantify the reliability of the detections. Both the PCCS2 and PCCS2E include polarization measurements, in the form of polarized flux densities, or upper limits, and orientation angles for all seven polarization-sensitive Planck channels. Finally, the improved data-processing of the full-mission maps and their reduced noise levels allow us to increase the number of objects in the catalogue, improving its completeness for the target 80% reliability as compared with the previous versions, the PCCS and the Early Release Compact Source Catalogue (ERCSC).« less
A Ks-band-selected catalogue of objects in the ALHAMBRA survey

NASA Astrophysics Data System (ADS)

Nieves-Seoane, L.; Fernandez-Soto, A.; Arnalte-Mur, P.; Molino, A.; Stefanon, M.; Ferreras, I.; Ascaso, B.; Ballesteros, F. J.; Cristóbal-Hornillos, D.; López-Sanjuán, C.; Hurtado-Gil, Ll.; Márquez, I.; Masegosa, J.; Aguerri, J. A. L.; Alfaro, E.; Aparicio-Villegas, T.; Benítez, N.; Broadhurst, T.; Cabrera-Caño, J.; Castander, F. J.; Cepa, J.; Cerviño, M.; González Delgado, R. M.; Husillos, C.; Infante, L.; Martínez, V. J.; Moles, M.; Olmo, A. del; Perea, J.; Pović, M.; Prada, F.; Quintana, J. M.; Troncoso-Iribarren, P.; Viironen, K.

2017-02-01

The original ALHAMBRA catalogue contained over 400 000 galaxies selected using a synthetic F814W image, to the magnitude limit AB(F814W) ≈ 24.5. Given the photometric redshift depth of the ALHAMBRA multiband data ( = 0.86) and the approximately I-band selection, there is a noticeable bias against red objects at moderate redshift. We avoid this bias by creating a new catalogue selected in the Ks band. This newly obtained catalogue is certainly shallower in terms of apparent magnitude, but deeper in terms of redshift, with a significant population of red objects at z > 1. We select objects using the Ks band images, which reach an approximate AB magnitude limit Ks ≈ 22. We generate masks and derive completeness functions to characterize the sample. We have tested the quality of the photometry and photometric redshifts using both internal and external checks. Our final catalogue includes ≈95 000 sources down to Ks ≈ 22, with a significant tail towards high redshift. We have checked that there is a large sample of objects with spectral energy distributions that correspond to that of massive, passively evolving galaxies at z > 1, reaching as far as z ≈ 2.5. We have tested the possibility of combining our data with deep infrared observations at longer wavelengths, particularly Spitzer IRAC data.
Planck 2015 results. XXVI. The Second Planck Catalogue of Compact Sources

NASA Astrophysics Data System (ADS)

Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Argüeso, F.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Battaner, E.; Beichman, C.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Böhringer, H.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Boulanger, F.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Carvalho, P.; Catalano, A.; Challinor, A.; Chamballu, A.; Chary, R.-R.; Chiang, H. C.; Christensen, P. R.; Clemens, M.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Couchot, F.; Coulais, A.; Crill, B. P.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Dickinson, C.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Falgarone, E.; Fergusson, J.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Galeotta, S.; Galli, S.; Ganga, K.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Helou, G.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kisner, T. S.; Kneissl, R.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leahy, J. P.; Leonardi, R.; León-Tavares, J.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Maris, M.; Marshall, D. J.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mitra, S.; Miville-Deschênes, M.-A.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Negrello, M.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Paladini, R.; Paoletti, D.; Partridge, B.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Reach, W. T.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Roudier, G.; Rowan-Robinson, M.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Sanghera, H. S.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Spencer, L. D.; Stolyarov, V.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tornikoski, M.; Tristram, M.; Tucci, M.; Tuovinen, J.; Türler, M.; Umana, G.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Walter, B.; Wandelt, B. D.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zonca, A.

2016-09-01

The Second Planck Catalogue of Compact Sources is a list of discrete objects detected in single-frequency maps from the full duration of the Planck mission and supersedes previous versions. It consists of compact sources, both Galactic and extragalactic, detected over the entire sky. Compact sources detected in the lower frequency channels are assigned to the PCCS2, while at higher frequencies they are assigned to one of two subcatalogues, the PCCS2 or PCCS2E, depending on their location on the sky. The first of these (PCCS2) covers most of the sky and allows the user to produce subsamples at higher reliabilities than the target 80% integral reliability of the catalogue. The second (PCCS2E) contains sources detected in sky regions where the diffuse emission makes it difficult to quantify the reliability of the detections. Both the PCCS2 and PCCS2E include polarization measurements, in the form of polarized flux densities, or upper limits, and orientation angles for all seven polarization-sensitive Planck channels. The improved data-processing of the full-mission maps and their reduced noise levels allow us to increase the number of objects in the catalogue, improving its completeness for the target 80% reliability as compared with the previous versions, the PCCS and the Early Release Compact Source Catalogue (ERCSC).
Planck 2015 results: XXVI. The Second Planck Catalogue of Compact Sources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ade, P. A. R.; Aghanim, N.; Argüeso, F.

The Second Planck Catalogue of Compact Sources is a list of discrete objects detected in single-frequency maps from the full duration of the Planck mission and supersedes previous versions. Also, it consists of compact sources, both Galactic and extragalactic, detected over the entire sky. Compact sources detected in the lower frequency channels are assigned to the PCCS2, while at higher frequencies they are assigned to one of two subcatalogues, the PCCS2 or PCCS2E, depending on their location on the sky. The first of these (PCCS2) covers most of the sky and allows the user to produce subsamples at higher reliabilitiesmore » than the target 80% integral reliability of the catalogue. The second (PCCS2E) contains sources detected in sky regions where the diffuse emission makes it difficult to quantify the reliability of the detections. Both the PCCS2 and PCCS2E include polarization measurements, in the form of polarized flux densities, or upper limits, and orientation angles for all seven polarization-sensitive Planck channels. Finally, the improved data-processing of the full-mission maps and their reduced noise levels allow us to increase the number of objects in the catalogue, improving its completeness for the target 80% reliability as compared with the previous versions, the PCCS and the Early Release Compact Source Catalogue (ERCSC).« less
The Challenges in Metadata Management: 20+ Years of ESO Data

NASA Astrophysics Data System (ADS)

Vera, I.; Da Rocha, C.; Dobrzycki, A.; Micol, A.; Vuong, M.

2015-09-01

The European Southern Observatory Science Archive Facility has been in operations for more than 20 years. It contains data produced by ESO telescopes as well as the metadata needed for characterizing and distributing those data. This metadata is used to build the different archive services provided by the Archive. Over these years, services have been added, modified or even decommissioned creating a cocktail of new, evolved and legacy data systems. The challenge for the Archive is to harmonize the differences of those data systems to provide the community with a homogeneous experience when using ESO data. In this paper, we present ESO experience in three particular challenging areas. First discussion is dedicated to the problem of metadata quality over the time, second discusses how to integrate obsolete data models on the current services and finally we will present the challenges of ever growing databases. We describe our experience dealing with those issues and the solutions adopted to mitigate them.
The second ROSAT All-Sky Survey source catalogue: the deepest X-ray All-Sky Survey before eROSITA

NASA Astrophysics Data System (ADS)

Boller, T.; Freyberg, M.; Truemper, J.

2014-07-01

We present the second ROSAT all-sky survey source catalogue (RASS2, (Boller, Freyberg, Truemper 2014, submitted)). The RASS2 is an extension of the ROSAT Bright Source Catalogue (BSC) and the ROSAT Faint Source Catalogue (FSC). The total number of sources in the second RASS catalogue is 124489. The extensions include (i) the supply of new user data products, i.e., X-ray images, X-ray spectra, and X-ray light curves, (ii) a visual screening of each individual detection, (iii) an improved detection algorithm compared to the SASS II processing. This results into an as most as reliable and as most as complete catalogue of point sources detected during the ROSAT Survey observations. We discuss for the first time the intra-day timing and spectral properties of the second RASS catalogue. We find new highly variable sources and we discuss their timing properties. Power law fits have been applied which allows to determine X-ray fluxes, X-ray absorbing columns, and X-ray photon indices. We give access to the second RASS catalogue and the associated data products via a web-interface to allow the community to perform further scientific exploration. The RASS2 catalogue provides the deepest X-ray All-Sky Survey before eROSITA data will become available.

Assessing colour-dependent occupation statistics inferred from galaxy group catalogues

NASA Astrophysics Data System (ADS)

Campbell, Duncan; van den Bosch, Frank C.; Hearin, Andrew; Padmanabhan, Nikhil; Berlind, Andreas; Mo, H. J.; Tinker, Jeremy; Yang, Xiaohu

2015-09-01

We investigate the ability of current implementations of galaxy group finders to recover colour-dependent halo occupation statistics. To test the fidelity of group catalogue inferred statistics, we run three different group finders used in the literature over a mock that includes galaxy colours in a realistic manner. Overall, the resulting mock group catalogues are remarkably similar, and most colour-dependent statistics are recovered with reasonable accuracy. However, it is also clear that certain systematic errors arise as a consequence of correlated errors in group membership determination, central/satellite designation, and halo mass assignment. We introduce a new statistic, the halo transition probability (HTP), which captures the combined impact of all these errors. As a rule of thumb, errors tend to equalize the properties of distinct galaxy populations (i.e. red versus blue galaxies or centrals versus satellites), and to result in inferred occupation statistics that are more accurate for red galaxies than for blue galaxies. A statistic that is particularly poorly recovered from the group catalogues is the red fraction of central galaxies as a function of halo mass. Group finders do a good job in recovering galactic conformity, but also have a tendency to introduce weak conformity when none is present. We conclude that proper inference of colour-dependent statistics from group catalogues is best achieved using forward modelling (i.e. running group finders over mock data) or by implementing a correction scheme based on the HTP, as long as the latter is not too strongly model dependent.
Report from the International Conference on Dublin Core and Metadata Applications, 2001.

ERIC Educational Resources Information Center

Sugimoto, Shigeo; Adachi, Jun; Baker, Thomas; Weibel, Stuart

This paper describes the International Conference on Dublin Core and Metadata Applications 2001 (DC-2001), the ninth major workshop of the Dublin Core Metadata Initiative (DCMI), which was held in Tokyo in October 2001. DC-2001 was a week-long event that included both a workshop and a conference. In the tradition of previous events, the workshop…
Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh

2013-12-01

SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of
The International Learning Object Metadata Survey

ERIC Educational Resources Information Center

Friesen, Norm

2004-01-01

A wide range of projects and organizations is currently making digital learning resources (learning objects) available to instructors, students, and designers via systematic, standards-based infrastructures. One standard that is central to many of these efforts and infrastructures is known as Learning Object Metadata (IEEE 1484.12.1-2002, or LOM).…
Catalogue Creation for Space Situational Awareness with Optical Sensors

NASA Astrophysics Data System (ADS)

Hobson, T.; Clarkson, I.; Bessell, T.; Rutten, M.; Gordon, N.; Moretti, N.; Morreale, B.

2016-09-01

In order to safeguard the continued use of space-based technologies, effective monitoring and tracking of man-made resident space objects (RSOs) is paramount. The diverse characteristics, behaviours and trajectories of RSOs make space surveillance a challenging application of the discipline that is tracking and surveillance. When surveillance systems are faced with non-canonical scenarios, it is common for human operators to intervene while researchers adapt and extend traditional tracking techniques in search of a solution. A complementary strategy for improving the robustness of space surveillance systems is to place greater emphasis on the anticipation of uncertainty. Namely, give the system the intelligence necessary to autonomously react to unforeseen events and to intelligently and appropriately act on tenuous information rather than discard it. In this paper we build from our 2015 campaign and describe the progression of a low-cost intelligent space surveillance system capable of autonomously cataloguing and maintaining track of RSOs. It currently exploits robotic electro-optical sensors, high-fidelity state-estimation and propagation as well as constrained initial orbit determination (IOD) to intelligently and adaptively manage its sensors in order to maintain an accurate catalogue of RSOs. In a step towards fully autonomous cataloguing, the system has been tasked with maintaining surveillance of a portion of the geosynchronous (GEO) belt. Using a combination of survey and track-refinement modes, the system is capable of maintaining a track of known RSOs and initiating tracks on previously unknown objects. Uniquely, due to the use of high-fidelity representations of a target's state uncertainty, as few as two images of previously unknown RSOs may be used to subsequently initiate autonomous search and reacquisition. To achieve this capability, particularly within the congested environment of the GEO-belt, we use a constrained admissible region (CAR) to
New catalogue of single-apparition comets discovered in the years 1901-1950. Part I

NASA Astrophysics Data System (ADS)

Królikowska, M.; Sitarski, G.; Pittich, E.; Szutowicz, S.; Ziołkowski, K.; Rickman, H.; Gabryszewski, R.; Rickman, B.

2014-07-01

A new catalogue of cometary orbits derived using a completely homogeneous method of data treatment, accurate methods of numerical integration, and modern model of the Solar System is presented. We constructed a sample of near-parabolic comets from the first half of the twentieth century with original reciprocals of semimajor axes less than 0.000130 au^{-1} in the Marsden and Williams Catalogue of Cometary Orbits (2008, hereafter MW08), i.e., comets of original semimajor axes larger than 7700 au. We found 38 such comets in MW08, where 32 have first-quality orbits (class 1A or 1B) and the remaining 6 have second-quality orbits (2A or 2B). We presented satisfactory non-gravitational (hereafter NG) models for thirteen of the investigated comets. The four main features, distinguishing this catalogue of orbits of single- apparition comets discovered in the early twentieth century from other catalogues of orbits of similarly old objects, are the following. 1. Old cometary positional observations require a very careful analysis. For the purpose of this new catalogue, great emphasis has been placed in collecting sets of observations as complete as possible for the investigated comets. Moreover, for many observations, comet-minus-star-type measurements were also available. This type of data was particularly valuable as the most original measurements of comet positions and has allowed us to recalculate new positions of comets using the PPM star catalogue. 2. Old cometary observations were prepared by observers usually as apparent positions in Right Ascension and Declination or as reduced positions for the epoch of the beginning of the year of a given observation. This was a huge advantage of these data, because this allows us to uniformly take into account all necessary corrections associated with the data reduction to the standard epoch. 3. The osculating orbits of single-apparition comets discovered more than sixty years ago have been formerly determined with very different
Multifractal Omori law for earthquake triggering: new tests on the California, Japan and worldwide catalogues

NASA Astrophysics Data System (ADS)

Ouillon, G.; Sornette, D.; Ribeiro, E.

2009-07-01

The Multifractal Stress-Activated model is a statistical model of triggered seismicity based on mechanical and thermodynamic principles. It predicts that, above a triggering magnitude cut-off M0, the exponent p of the Omori law for the time decay of the rate of aftershocks is a linear increasing function p(M) = a0M + b0 of the main shock magnitude M. We previously reported empirical support for this prediction, using the Southern California Earthquake Center (SCEC) catalogue. Here, we confirm this observation using an updated, longer version of the same catalogue, as well as new methods to estimate p. One of this methods is the newly defined Scaling Function Analysis (SFA), adapted from the wavelet transform. This method is able to measure a mathematical singularity (hence a p-value), erasing the possible regular part of a time-series. The SFA also proves particularly efficient to reveal the coexistence and superposition of several types of relaxation laws (typical Omori sequences and short-lived swarms sequences) which can be mixed within the same catalogue. Another new method consists in monitoring the largest aftershock magnitude observed in successive time intervals, and thus shortcuts the problem of missing events with small magnitudes in aftershock catalogues. The same methods are used on data from the worldwide Harvard Centroid Moment Tensor (CMT) catalogue and show results compatible with those of Southern California. For the Japan Meteorological Agency (JMA) catalogue, we still observe a linear dependence of p on M, but with a smaller slope. The SFA shows however that results for this catalogue may be biased by numerous swarm sequences, despite our efforts to remove them before the analysis.
CellML metadata standards, associated tools and repositories

PubMed Central

Beard, Daniel A.; Britten, Randall; Cooling, Mike T.; Garny, Alan; Halstead, Matt D.B.; Hunter, Peter J.; Lawson, James; Lloyd, Catherine M.; Marsh, Justin; Miller, Andrew; Nickerson, David P.; Nielsen, Poul M.F.; Nomura, Taishin; Subramanium, Shankar; Wimalaratne, Sarala M.; Yu, Tommy

2009-01-01

The development of standards for encoding mathematical models is an important component of model building and model sharing among scientists interested in understanding multi-scale physiological processes. CellML provides such a standard, particularly for models based on biophysical mechanisms, and a substantial number of models are now available in the CellML Model Repository. However, there is an urgent need to extend the current CellML metadata standard to provide biological and biophysical annotation of the models in order to facilitate model sharing, automated model reduction and connection to biological databases. This paper gives a broad overview of a number of new developments on CellML metadata and provides links to further methodological details available from the CellML website. PMID:19380315
A novel metadata management model to capture consent for record linkage in longitudinal research studies.

PubMed

McMahon, Christiana; Denaxas, Spiros

2017-11-06

Informed consent is an important feature of longitudinal research studies as it enables the linking of the baseline participant information with administrative data. The lack of standardized models to capture consent elements can lead to substantial challenges. A structured approach to capturing consent-related metadata can address these. a) Explore the state-of-the-art for recording consent; b) Identify key elements of consent required for record linkage; and c) Create and evaluate a novel metadata management model to capture consent-related metadata. The main methodological components of our work were: a) a systematic literature review and qualitative analysis of consent forms; b) the development and evaluation of a novel metadata model. We qualitatively analyzed 61 manuscripts and 30 consent forms. We extracted data elements related to obtaining consent for linkage. We created a novel metadata management model for consent and evaluated it by comparison with the existing standards and by iteratively applying it to case studies. The developed model can facilitate the standardized recording of consent for linkage in longitudinal research studies and enable the linkage of external participant data. Furthermore, it can provide a structured way of recording consent-related metadata and facilitate the harmonization and streamlining of processes.
A Grid Metadata Service for Earth and Environmental Sciences

NASA Astrophysics Data System (ADS)

Fiore, Sandro; Negro, Alessandro; Aloisio, Giovanni

2010-05-01

Critical challenges for climate modeling researchers are strongly connected with the increasingly complex simulation models and the huge quantities of produced datasets. Future trends in climate modeling will only increase computational and storage requirements. For this reason the ability to transparently access to both computational and data resources for large-scale complex climate simulations must be considered as a key requirement for Earth Science and Environmental distributed systems. From the data management perspective (i) the quantity of data will continuously increases, (ii) data will become more and more distributed and widespread, (iii) data sharing/federation will represent a key challenging issue among different sites distributed worldwide, (iv) the potential community of users (large and heterogeneous) will be interested in discovery experimental results, searching of metadata, browsing collections of files, compare different results, display output, etc.; A key element to carry out data search and discovery, manage and access huge and distributed amount of data is the metadata handling framework. What we propose for the management of distributed datasets is the GRelC service (a data grid solution focusing on metadata management). Despite the classical approaches, the proposed data-grid solution is able to address scalability, transparency, security and efficiency and interoperability. The GRelC service we propose is able to provide access to metadata stored in different and widespread data sources (relational databases running on top of MySQL, Oracle, DB2, etc. leveraging SQL as query language, as well as XML databases - XIndice, eXist, and libxml2 based documents, adopting either XPath or XQuery) providing a strong data virtualization layer in a grid environment. Such a technological solution for distributed metadata management leverages on well known adopted standards (W3C, OASIS, etc.); (ii) supports role-based management (based on VOMS), which
Stop the Bleeding: the Development of a Tool to Streamline NASA Earth Science Metadata Curation Efforts

NASA Astrophysics Data System (ADS)

le Roux, J.; Baker, A.; Caltagirone, S.; Bugbee, K.

2017-12-01

The Common Metadata Repository (CMR) is a high-performance, high-quality repository for Earth science metadata records, and serves as the primary way to search NASA's growing 17.5 petabytes of Earth science data holdings. Released in 2015, CMR has the capability to support several different metadata standards already being utilized by NASA's combined network of Earth science data providers, or Distributed Active Archive Centers (DAACs). The Analysis and Review of CMR (ARC) Team located at Marshall Space Flight Center is working to improve the quality of records already in CMR with the goal of making records optimal for search and discovery. This effort entails a combination of automated and manual review, where each NASA record in CMR is checked for completeness, accuracy, and consistency. This effort is highly collaborative in nature, requiring communication and transparency of findings amongst NASA personnel, DAACs, the CMR team and other metadata curation teams. Through the evolution of this project it has become apparent that there is a need to document and report findings, as well as track metadata improvements in a more efficient manner. The ARC team has collaborated with Element 84 in order to develop a metadata curation tool to meet these needs. In this presentation, we will provide an overview of this metadata curation tool and its current capabilities. Challenges and future plans for the tool will also be discussed.
Development of an open metadata schema for prospective clinical research (openPCR) in China.

PubMed

Xu, W; Guan, Z; Sun, J; Wang, Z; Geng, Y

2014-01-01

In China, deployment of electronic data capture (EDC) and clinical data management system (CDMS) for clinical research (CR) is in its very early stage, and about 90% of clinical studies collected and submitted clinical data manually. This work aims to build an open metadata schema for Prospective Clinical Research (openPCR) in China based on openEHR archetypes, in order to help Chinese researchers easily create specific data entry templates for registration, study design and clinical data collection. Singapore Framework for Dublin Core Application Profiles (DCAP) is used to develop openPCR and four steps such as defining the core functional requirements and deducing the core metadata items, developing archetype models, defining metadata terms and creating archetype records, and finally developing implementation syntax are followed. The core functional requirements are divided into three categories: requirements for research registration, requirements for trial design, and requirements for case report form (CRF). 74 metadata items are identified and their Chinese authority names are created. The minimum metadata set of openPCR includes 3 documents, 6 sections, 26 top level data groups, 32 lower data groups and 74 data elements. The top level container in openPCR is composed of public document, internal document and clinical document archetypes. A hierarchical structure of openPCR is established according to Data Structure of Electronic Health Record Architecture and Data Standard of China (Chinese EHR Standard). Metadata attributes are grouped into six parts: identification, definition, representation, relation, usage guides, and administration. OpenPCR is an open metadata schema based on research registration standards, standards of the Clinical Data Interchange Standards Consortium (CDISC) and Chinese healthcare related standards, and is to be publicly available throughout China. It considers future integration of EHR and CR by adopting data structure and data
Characterization of Educational Resources in e-Learning Systems Using an Educational Metadata Profile

ERIC Educational Resources Information Center

Solomou, Georgia; Pierrakeas, Christos; Kameas, Achilles

2015-01-01

The ability to effectively administrate educational resources in terms of accessibility, reusability and interoperability lies in the adoption of an appropriate metadata schema, able of adequately describing them. A considerable number of different educational metadata schemas can be found in literature, with the IEEE LOM being the most widely…
VizieR Online Data Catalog: The XMM-Newton 2nd Incremental Source Catalogue (2XMMi) (XMM-SSC, 2008)

NASA Astrophysics Data System (ADS)

Xmm-Newton Survey Science Centre, Consortium

2007-09-01

The 2XMMi catalogue is the fourth publicly released XMM X-ray source catalogue produced by the XMM Survey Science Centre (SSC) consortium, following the 1XMM (Cat. IX/37, released in April 2003), 2XMMp (July 2006) and 2XMM (Cat. IX/39, August 2007) catalogues: 2XMMp was a preliminary version of 2XMM. 2XMMi is an incremental version of the 2XMM catalogue. The 2XMMi catalogue is about 17% larger than the 2XMM catalogue, which it supersedes, due to the 1-year longer baseline of observations included (it is about 8 times larger than the 1XMM catalogue). As such, it is the largest X-ray source catalogue ever produced, containing more than twice as many discrete sources as either the ROSAT survey or pointed catalogues. 2XMMi complements deeper Chandra and XMM-Newton small area surveys, probing a large sky area at the flux limit where the bulk of the objects that contribute to the X-ray background lie. The 2XMMi catalogue provides a rich resource for generating large, well-defined samples for specific studies, utilizing the fact that X-ray selection is a highly efficient (arguably the most efficient) way of selecting certain types of object, notably active galaxies (AGN), clusters of galaxies, interacting compact binaries and active stellar coronae. The large sky area covered by the serendipitous survey, or equivalently the large size of the catalogue, also means that 2XMMi is a superb resource for exploring the variety of the X-ray source population and identifying rare source types. The production of the 2XMMi catalogue has been undertaken by the XMM-Newton SSC consortium in fulfilment of one of its major responsibilities within the XMM-Newton project. The catalogue production process has been designed to exploit fully the capabilities of the XMM-Newton EPIC cameras and to ensure the integrity and quality of the resultant catalogue through rigorous screening of the data. The predecessor 2XMM catalogue was made from a subset of public observations emerging from a re
VizieR Online Data Catalog: The XMM-Newton 2nd Incremental Source Catalogue (2XMMi) (XMM-SSC, 2008)

NASA Astrophysics Data System (ADS)

Xmm-Newton Survey Science Centre, Consortium

2008-09-01

The 2XMMi catalogue is the fourth publicly released XMM X-ray source catalogue produced by the XMM Survey Science Centre (SSC) consortium, following the 1XMM (Cat. IX/37, released in April 2003), 2XMMp (July 2006) and 2XMM (Cat. IX/39, August 2007) catalogues: 2XMMp was a preliminary version of 2XMM. 2XMMi is an incremental version of the 2XMM catalogue. The 2XMMi catalogue is about 17% larger than the 2XMM catalogue, which it supersedes, due to the 1-year longer baseline of observations included (it is about 8 times larger than the 1XMM catalogue). As such, it is the largest X-ray source catalogue ever produced, containing more than twice as many discrete sources as either the ROSAT survey or pointed catalogues. 2XMMi complements deeper Chandra and XMM-Newton small area surveys, probing a large sky area at the flux limit where the bulk of the objects that contribute to the X-ray background lie. The 2XMMi catalogue provides a rich resource for generating large, well-defined samples for specific studies, utilizing the fact that X-ray selection is a highly efficient (arguably the most efficient) way of selecting certain types of object, notably active galaxies (AGN), clusters of galaxies, interacting compact binaries and active stellar coronae. The large sky area covered by the serendipitous survey, or equivalently the large size of the catalogue, also means that 2XMMi is a superb resource for exploring the variety of the X-ray source population and identifying rare source types. The production of the 2XMMi catalogue has been undertaken by the XMM-Newton SSC consortium in fulfilment of one of its major responsibilities within the XMM-Newton project. The catalogue production process has been designed to exploit fully the capabilities of the XMM-Newton EPIC cameras and to ensure the integrity and quality of the resultant catalogue through rigorous screening of the data. The predecessor 2XMM catalogue was made from a subset of public observations emerging from a re
Using a Simple Knowledge Organization System to facilitate Catalogue and Search for the ESA CCI Open Data Portal

NASA Astrophysics Data System (ADS)

Wilson, Antony; Bennett, Victoria; Donegan, Steve; Juckes, Martin; Kershaw, Philip; Petrie, Ruth; Stephens, Ag; Waterfall, Alison

2016-04-01

The ESA Climate Change Initiative (CCI) is a €75m programme that runs from 2009-2016, with a goal to provide stable, long-term, satellite-based essential climate variable (ECV) data products for climate modellers and researchers. As part of the CCI, ESA have funded the Open Data Portal project to establish a central repository to bring together the data from these multiple sources and make it available in a consistent way, in order to maximise its dissemination amongst the international user community. Search capabilities are a critical component to attaining this goal. To this end, the project is providing dataset-level metadata in the form of ISO 19115 records served via a standard OGC CSW interface. In addition, the Open Data Portal is re-using the search system from the Earth System Grid Federation (ESGF), successfully applied to support CMIP5 (5th Coupled Model Intercomparison Project) and obs4MIPs. This uses a tightly defined controlled vocabulary of metadata terms, the DRS (The Data Reference Syntax) which encompass different aspects of the data. This system hs facilitated the construction of a powerful faceted search interface to enable users to discover data at the individual file level of granularity through ESGF's web portal frontend. The use of a consistent set of model experiments for CMIP5 allowed the definition of a uniform DRS for all model data served from ESGF. For CCI however, there are thirteen ECVs, each of which is derived from multiple sources and different science communities resulting in highly heterogeneous metadata. An analysis has been undertaken of the concepts in use, with the aim to produce a CCI DRS which could be provide a single authoritative source for cataloguing and searching the CCI data for the Open Data Portal. The use of SKOS (Simple Knowledge Organization System) and OWL (Web Ontology Language) to represent the DRS are a natural fit and provide controlled vocabularies as well as a way to represent relationships between
The Benefits and Future of Standards: Metadata and Beyond

NASA Astrophysics Data System (ADS)

Stracke, Christian M.

This article discusses the benefits and future of standards and presents the generic multi-dimensional Reference Model. First the importance and the tasks of interoperability as well as quality development and their relationship are analyzed. Especially in e-Learning their connection and interdependence is evident: Interoperability is one basic requirement for quality development. In this paper, it is shown how standards and specifications are supporting these crucial issues. The upcoming ISO metadata standard MLR (Metadata for Learning Resource) will be introduced and used as example for identifying the requirements and needs for future standardization. In conclusion a vision of the challenges and potentials for e-Learning standardization is outlined.
Securing the AliEn File Catalogue - Enforcing authorization with accountable file operations

NASA Astrophysics Data System (ADS)

Schreiner, Steffen; Bagnasco, Stefano; Sankar Banerjee, Subho; Betev, Latchezar; Carminati, Federico; Vladimirovna Datskova, Olga; Furano, Fabrizio; Grigoras, Alina; Grigoras, Costin; Mendez Lorenzo, Patricia; Peters, Andreas Joachim; Saiz, Pablo; Zhu, Jianlin

2011-12-01

The AliEn Grid Services, as operated by the ALICE Collaboration in its global physics analysis grid framework, is based on a central File Catalogue together with a distributed set of storage systems and the possibility to register links to external data resources. This paper describes several identified vulnerabilities in the AliEn File Catalogue access protocol regarding fraud and unauthorized file alteration and presents a more secure and revised design: a new mechanism, called LFN Booking Table, is introduced in order to keep track of access authorization in the transient state of files entering or leaving the File Catalogue. Due to a simplification of the original Access Envelope mechanism for xrootd-protocol-based storage systems, fundamental computational improvements of the mechanism were achieved as well as an up to 50% reduction of the credential's size. By extending the access protocol with signed status messages from the underlying storage system, the File Catalogue receives trusted information about a file's size and checksum and the protocol is no longer dependent on client trust. Altogether, the revised design complies with atomic and consistent transactions and allows for accountable, authentic, and traceable file operations. This paper describes these changes as part and beyond the development of AliEn version 2.19.
A catalogue of chromospherically active binary stars (third edition)

NASA Astrophysics Data System (ADS)

Eker, Z.; Ak, N. Filiz; Bilir, S.; Doǧru, D.; Tüysüz, M.; Soydugan, E.; Bakış, H.; Uǧraş, B.; Soydugan, F.; Erdem, A.; Demircan, O.

2008-10-01

The catalogue of chromospherically active binaries (CABs) has been revised and updated. With 203 new identifications, the number of CAB stars is increased to 409. The catalogue is available in electronic format where each system has a number of lines (suborders) with a unique order number. The columns contain data of limited numbers of selected cross references, comments to explain peculiarities and the position of the binarity in case it belongs to a multiple system, classical identifications (RS Canum Venaticorum, BY Draconis), brightness and colours, photometric and spectroscopic data, a description of emission features (CaII H and K, Hα, ultraviolet, infrared), X-ray luminosity, radio flux, physical quantities and orbital information, where each basic entry is referenced so users can go to the original sources.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2005-01-01

This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.

Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2004-12-01

This paper considers the deceptively simple question: Why can"t digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

PubMed Central

Alexopoulou, Dimitra; Andreopoulos, Bill; Dietze, Heiko; Doms, Andreas; Gandon, Fabien; Hakenberg, Jörg; Khelif, Khaled; Schroeder, Michael; Wächter, Thomas

2009-01-01

Background Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively. Results The 'Closest Sense' method assumes that the ontology defines multiple senses of the term. It computes the shortest path of co-occurring terms in the document to one of these senses. The 'Term Cooc' method defines a log-odds ratio for co-occurring terms including co-occurrences inferred from the ontology structure. The 'MetaData' approach trains a classifier on metadata. It does not require any ontology, but requires training data, which the other methods do not. To evaluate these approaches we defined a manually curated training corpus of 2600 documents for seven ambiguous terms from the Gene Ontology and MeSH. All approaches over all conditions achieve 80% success rate on average. The 'MetaData' approach performed best with 96%, when trained on high-quality data. Its performance deteriorates as quality of the training data decreases. The 'Term Cooc' approach performs better on Gene Ontology (92% success) than on MeSH (73% success) as MeSH is not a strict is-a/part-of, but rather a loose is-related-to hierarchy. The 'Closest Sense' approach achieves on average 80% success rate. Conclusion Metadata is valuable for disambiguation, but requires high quality training data. Closest Sense requires no training, but a large, consistently modelled ontology, which are two opposing conditions. Term Cooc achieves greater 90% success given a consistently
Organizing Scientific Data Sets: Studying Similarities and Differences in Metadata and Subject Term Creation

ERIC Educational Resources Information Center

White, Hollie C.

2012-01-01

Background: According to Salo (2010), the metadata entered into repositories are "disorganized" and metadata schemes underlying repositories are "arcane". This creates a challenging repository environment in regards to personal information management (PIM) and knowledge organization systems (KOSs). This dissertation research is…
Metadata: Pure and Simple, or Is It?

ERIC Educational Resources Information Center

Chalmers, Marilyn

2002-01-01

Discusses issues concerning metadata in Web pages based on experiences in a vocational education center library in Queensland (Australia). Highlights include Dublin Core elements; search engines; controlled vocabulary; performance measurement to assess usage patterns and provide quality control over the vocabulary; and considerations given the…
Digital Preservation and Metadata: History, Theory, Practice.

ERIC Educational Resources Information Center

Lazinger, Susan S.

This book addresses critical issues of digital preservation, providing guidelines for protecting resources from dealing with obsolescence, to responsibilities, methods of preservation, cost, and metadata formats. It also shows numerous national and international institutions that provide frameworks for digital libraries and archives. The first…
VizieR Online Data Catalog: WATCH Solar X-Ray Burst Catalogue (Crosby+ 1998)

NASA Astrophysics Data System (ADS)

Crosby, N.; Lund, N.; Vilmer, N.; Sunyaev, R.

1998-01-01

Catalogue containing solar X-ray bursts measured by the Danish Wide Angle Telescope for Cosmic Hard X-Rays (WATCH) experiment aboard the Russian satellite GRANAT in the deca-keV energy range. Table 1 lists the periods during which solar observations with WATCH are available (WATCH ON-TIME) and where the bursts listed in the catalogue have been observed. (2 data files).
Empirical relations to convert magnitudes of the earthquake catalogue for the north western of Algeria

NASA Astrophysics Data System (ADS)

Belayadi, Ilyes; Bezzeghoud, Mourad; Fontiela, João; Nadji, Amansour

2017-04-01

North Algeria is one of the most seismically active regions on the western Mediterranean basin and it is related with the boundaries of the Eurasian and Nubian plates. We compiled an earthquake catalogue for the north western of Algeria, within the area -2°W-1°E and 34°N-37°N for the time span 1790 - 2016. To compile the earthquake catalogue we merge all available catalogues either national and international. Then we remove all duplicates and fake earthquakes. The lower level of the catalogue entries is set at M = 2.5. Nevertheless, the magnitudes reported on the catalogue are ML, Ms, Mb, Mw and macroseismic intensity. Thus, we develop new empirical relations to calculate the Mw from the different magnitudes and intensity suitable to the seismic hazard and geodynamic context of North Algeria. Acknowledgements: Ilyes Belayadi is funded entirely by the University of Oran 2 Mohamed Ben Ahmed (Algeria). This work is co-financed by the European Union through the European Regional Development Fund under COMPETE 2020 (Operational Program for Competitiveness and Internationalization) through the ICT project (UID / GEO / 04683/2013) under the reference POCI-01-0145 -FEDER-007690.
The Solar Stormwatch CME catalogue: Results from the first space weather citizen science project

NASA Astrophysics Data System (ADS)

Barnard, L.; Scott, C.; Owens, M.; Lockwood, M.; Tucker-Hood, K.; Thomas, S.; Crothers, S.; Davies, J. A.; Harrison, R.; Lintott, C.; Simpson, R.; O'Donnell, J.; Smith, A. M.; Waterson, N.; Bamford, S.; Romeo, F.; Kukula, M.; Owens, B.; Savani, N.; Wilkinson, J.; Baeten, E.; Poeffel, L.; Harder, B.

2014-12-01

Solar Stormwatch was the first space weather citizen science project, the aim of which is to identify and track coronal mass ejections (CMEs) observed by the Heliospheric Imagers aboard the STEREO satellites. The project has now been running for approximately 4 years, with input from >16,000 citizen scientists, resulting in a data set of >38,000time-elongation profiles of CME trajectories, observed over 18 preselected position angles. We present our method for reducing this data set into a CME catalogue. The resulting catalogue consists of 144 CMEs over the period January 2007 to February 2010, of which 110 were observed by STEREO-A and 77 were observed by STEREO-B. For each CME, the time-elongation profiles generated by the citizen scientists are averaged into a consensus profile along each position angle that the event was tracked. We consider this catalogue to be unique, being at present the only citizen science-generated CME catalogue, tracking CMEs over an elongation range of 4° out to a maximum of approximately 70°. Using single spacecraft fitting techniques, we estimate the speed, direction, solar source region, and latitudinal width of each CME. This shows that at present, the Solar Stormwatch catalogue (which covers only solar minimum years) contains almost exclusively slow CMEs, with a mean speed of approximately 350 km s-1. The full catalogue is available for public access at www.met.reading.ac.uk/~spate/solarstormwatch. This includes, for each event, the unprocessed time-elongation profiles generated by Solar Stormwatch, the consensus time-elongation profiles, and a set of summary plots, as well as the estimated CME properties.
Discovery of Marine Datasets and Geospatial Metadata Visualization

NASA Astrophysics Data System (ADS)

Schwehr, K. D.; Brennan, R. T.; Sellars, J.; Smith, S.

2009-12-01

NOAA's National Geophysical Data Center (NGDC) provides the deep archive of US multibeam sonar hydrographic surveys. NOAA stores the data as Bathymetric Attributed Grids (BAG; http://www.opennavsurf.org/) that are HDF5 formatted files containing gridded bathymetry, gridded uncertainty, and XML metadata. While NGDC provides the deep store and a basic ERSI ArcIMS interface to the data, additional tools need to be created to increase the frequency with which researchers discover hydrographic surveys that might be beneficial for their research. Using Open Source tools, we have created a draft of a Google Earth visualization of NOAA's complete collection of BAG files as of March 2009. Each survey is represented as a bounding box, an optional preview image of the survey data, and a pop up placemark. The placemark contains a brief summary of the metadata and links to directly download of the BAG survey files and the complete metadata file. Each survey is time tagged so that users can search both in space and time for surveys that meet their needs. By creating this visualization, we aim to make the entire process of data discovery, validation of relevance, and download much more efficient for research scientists who may not be familiar with NOAA's hydrographic survey efforts or the BAG format. In the process of creating this demonstration, we have identified a number of improvements that can be made to the hydrographic survey process in order to make the results easier to use especially with respect to metadata generation. With the combination of the NGDC deep archiving infrastructure, a Google Earth virtual globe visualization, and GeoRSS feeds of updates, we hope to increase the utilization of these high-quality gridded bathymetry. This workflow applies equally well to LIDAR topography and bathymetry. Additionally, with proper referencing and geotagging in journal publications, we hope to close the loop and help the community create a true “Geospatial Scholar�
From the inside-out: Retrospectives on a metadata improvement process to advance the discoverability of NASÁs earth science data

NASA Astrophysics Data System (ADS)

Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.

2017-12-01

Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been
Identification of stars and digital version of the catalogue of 1958 by Brodskaya and Shajn

NASA Astrophysics Data System (ADS)

Gorbunov, M. A.; Shlyapnikov, A. A.

2017-12-01

The following topics are considered: the identification of objects on search maps, the determination of their coordinates at the epoch of 2000, and converting the published version of the catalogue of 1958 by Brodskaya and Shajn into a machine-readable format. The statistics for photometric and spectral data from the original catalogue is presented. A digital version of the catalogue is described, as well as its presentation in HTML, VOTable and AJS formats and the basic principles of work in the interactive application of International Virtual Observatory - the Aladin Sky Atlas.
Inconsistencies between Academic E-Book Platforms: A Comparison of Metadata and Search Results

ERIC Educational Resources Information Center

Wiersma, Gabrielle; Tovstiadi, Esta

2017-01-01

This article presents the results of a study of academic e-books that compared the metadata and search results from major academic e-book platforms. The authors collected data and performed a series of test searches designed to produce the same result regardless of platform. Testing, however, revealed metadata-related errors and significant…
The Ontological Perspectives of the Semantic Web and the Metadata Harvesting Protocol: Applications of Metadata for Improving Web Search.

ERIC Educational Resources Information Center

Fast, Karl V.; Campbell, D. Grant

2001-01-01

Compares the implied ontological frameworks of the Open Archives Initiative Protocol for Metadata Harvesting and the World Wide Web Consortium's Semantic Web. Discusses current search engine technology, semantic markup, indexing principles of special libraries and online databases, and componentization and the distinction between data and…
Sea Level Station Metadata for Tsunami Detection, Warning and Research

NASA Astrophysics Data System (ADS)

Stroker, K. J.; Marra, J.; Kari, U. S.; Weinstein, S. A.; Kong, L.

2007-12-01

The devastating earthquake and tsunami of December 26, 2004 has greatly increased recognition of the need for water level data both from the coasts and the deep-ocean. In 2006, the National Oceanic and Atmospheric Administration (NOAA) completed a Tsunami Data Management Report describing the management of data required to minimize the impact of tsunamis in the United States. One of the major gaps defined in this report is the access to global coastal water level data. NOAA's National Geophysical Data Center (NGDC) and National Climatic Data Center (NCDC) are working cooperatively to bridge this gap. NOAA relies on a network of global data, acquired and processed in real-time to support tsunami detection and warning, as well as high-quality global databases of archived data to support research and advanced scientific modeling. In 2005, parties interested in enhancing the access and use of sea level station data united under the NOAA NCDC's Integrated Data and Environmental Applications (IDEA) Center's Pacific Region Integrated Data Enterprise (PRIDE) program to develop a distributed metadata system describing sea level stations (Kari et. al., 2006; Marra et.al., in press). This effort started with pilot activities in a regional framework and is targeted at tsunami detection and warning systems being developed by various agencies. It includes development of the components of a prototype sea level station metadata web service and accompanying Google Earth-based client application, which use an XML-based schema to expose, at a minimum, information in the NOAA National Weather Service (NWS) Pacific Tsunami Warning Center (PTWC) station database needed to use the PTWC's Tide Tool application. As identified in the Tsunami Data Management Report, the need also exists for long-term retention of the sea level station data. NOAA envisions that the retrospective water level data and metadata will also be available through web services, using an XML-based schema. Five high
Multiband Study of Radio Sources of the Rcr Catalogue with Virtual Observatory Tools

NASA Astrophysics Data System (ADS)

Zhelenkova, O. P.; Soboleva, N. S.; Majorova, E. K.; Temirova, A. V.

We present early results of our multiband study of the RATAN Cold Revised (RCR) catalogue obtained from seven cycles of the ``Cold'' survey carried with the RATAN-600 radio telescope at 7.6 cm in 1980--1999, at the declination of the SS 433 source. We used the 2MASS and LAS UKIDSS infrared surveys, the DSS-II and SDSS DR7 optical surveys, as well as the USNO-B1 and GSC-II catalogues, the VLSS, TXS, NVSS, FIRST and GB6 radio surveys to accumulate information about the sources. For radio sources that have no detectable optical candidate in optical or infrared catalogues, we additionally looked through images in several bands from the SDSS, LAS UKIDSS, DPOSS, 2MASS surveys and also used co-added frames in different bands. We reliably identified 76% of radio sources of the RCR catalogue. We used the ALADIN and SAOImage DS9 scripting capabilities, interoperability services of ALADIN and TOPCAT, and also other Virtual Observatory (VO) tools and resources, such as CASJobs, NED, Vizier, and WSA, for effective data access, visualization and analysis. Without VO tools it would have been problematic to perform our study.
Integrating XQuery-Enabled SCORM XML Metadata Repositories into an RDF-Based E-Learning P2P Network

ERIC Educational Resources Information Center

Qu, Changtao; Nejdl, Wolfgang

2004-01-01

Edutella is an RDF-based E-Learning P2P network that is aimed to accommodate heterogeneous learning resource metadata repositories in a P2P manner and further facilitate the exchange of metadata between these repositories based on RDF. Whereas Edutella provides RDF metadata repositories with a quite natural integration approach, XML metadata…
The Value of Data and Metadata Standardization for Interoperability in Giovanni

NASA Astrophysics Data System (ADS)

Smit, C.; Hegde, M.; Strub, R. F.; Bryant, K.; Li, A.; Petrenko, M.

2017-12-01

Giovanni (https://giovanni.gsfc.nasa.gov/giovanni/) is a data exploration and visualization tool at the NASA Goddard Earth Sciences Data Information Services Center (GES DISC). It has been around in one form or another for more than 15 years. Giovanni calculates simple statistics and produces 22 different visualizations for more than 1600 geophysical parameters from more than 90 satellite and model products. Giovanni relies on external data format standards to ensure interoperability, including the NetCDF CF Metadata Conventions. Unfortunately, these standards were insufficient to make Giovanni's internal data representation truly simple to use. Finding and working with dimensions can be convoluted with the CF Conventions. Furthermore, the CF Conventions are silent on machine-friendly descriptive metadata such as the parameter's source product and product version. In order to simplify analyzing disparate earth science data parameters in a unified way, we developed Giovanni's internal standard. First, the format standardizes parameter dimensions and variables so they can be easily found. Second, the format adds all the machine-friendly metadata Giovanni needs to present our parameters to users in a consistent and clear manner. At a glance, users can grasp all the pertinent information about parameters both during parameter selection and after visualization. This poster gives examples of how our metadata and data standards, both external and internal, have both simplified our code base and improved our users' experiences.
The Planetary Data System Information Model for Geometry Metadata

NASA Astrophysics Data System (ADS)

Guinness, E. A.; Gordon, M. K.

2014-12-01

The NASA Planetary Data System (PDS) has recently developed a new set of archiving standards based on a rigorously defined information model. An important part of the new PDS information model is the model for geometry metadata, which includes, for example, attributes of the lighting and viewing angles of observations, position and velocity vectors of a spacecraft relative to Sun and observing body at the time of observation and the location and orientation of an observation on the target. The PDS geometry model is based on requirements gathered from the planetary research community, data producers, and software engineers who build search tools. A key requirement for the model is that it fully supports the breadth of PDS archives that include a wide range of data types from missions and instruments observing many types of solar system bodies such as planets, ring systems, and smaller bodies (moons, comets, and asteroids). Thus, important design aspects of the geometry model are that it standardizes the definition of the geometry attributes and provides consistency of geometry metadata across planetary science disciplines. The model specification also includes parameters so that the context of values can be unambiguously interpreted. For example, the reference frame used for specifying geographic locations on a planetary body is explicitly included with the other geometry metadata parameters. The structure and content of the new PDS geometry model is designed to enable both science analysis and efficient development of search tools. The geometry model is implemented in XML, as is the main PDS information model, and uses XML schema for validation. The initial version of the geometry model is focused on geometry for remote sensing observations conducted by flyby and orbiting spacecraft. Future releases of the PDS geometry model will be expanded to include metadata for landed and rover spacecraft.
Public release of the ISC-GEM Global Instrumental Earthquake Catalogue (1900-2009)

USGS Publications Warehouse

Storchak, Dmitry A.; Di Giacomo, Domenico; Bondára, István; Engdahl, E. Robert; Harris, James; Lee, William H.K.; Villaseñor, Antonio; Bormann, Peter

2013-01-01

The International Seismological Centre–Global Earthquake Model (ISC–GEM) Global Instrumental Earthquake Catalogue (1900–2009) is the result of a special effort to substantially extend and improve currently existing global catalogs to serve the requirements of specific user groups who assess and model seismic hazard and risk. The data from the ISC–GEM Catalogue would be used worldwide yet will prove absolutely essential in those regions where a high seismicity level strongly correlates with a high population density.
Improving the accessibility and re-use of environmental models through provision of model metadata - a scoping study

NASA Astrophysics Data System (ADS)

Riddick, Andrew; Hughes, Andrew; Harpham, Quillon; Royse, Katherine; Singh, Anubha

2014-05-01

There has been an increasing interest both from academic and commercial organisations over recent years in developing hydrologic and other environmental models in response to some of the major challenges facing the environment, for example environmental change and its effects and ensuring water resource security. This has resulted in a significant investment in modelling by many organisations both in terms of financial resources and intellectual capital. To capitalise on the effort on producing models, then it is necessary for the models to be both discoverable and appropriately described. If this is not undertaken then the effort in producing the models will be wasted. However, whilst there are some recognised metadata standards relating to datasets these may not completely address the needs of modellers regarding input data for example. Also there appears to be a lack of metadata schemes configured to encourage the discovery and re-use of the models themselves. The lack of an established standard for model metadata is considered to be a factor inhibiting the more widespread use of environmental models particularly the use of linked model compositions which fuse together hydrologic models with models from other environmental disciplines. This poster presents the results of a Natural Environment Research Council (NERC) funded scoping study to understand the requirements of modellers and other end users for metadata about data and models. A user consultation exercise using an on-line questionnaire has been undertaken to capture the views of a wide spectrum of stakeholders on how they are currently managing metadata for modelling. This has provided a strong confirmation of our original supposition that there is a lack of systems and facilities to capture metadata about models. A number of specific gaps in current provision for data and model metadata were also identified, including a need for a standard means to record detailed information about the modelling

VizieR Online Data Catalog: Catalogue of Galactic Planetary Nebulae (Kohoutek, 2001)

NASA Astrophysics Data System (ADS)

Kohoutek, L.

2001-05-01

The "Catalogue of Galactic Planetary Nebulae (Version 2000)" appears in Abhandlungen aus der Hamburger Sternwarte, Band XII in the year 2001. It is a continuation of CGPN(1967) and contains 1510 objects classified as galactic PNe up to the end of 1999. The lists of possible pre-PNe and possible post-PNe are also given. The catalogue is restricted only to the data belonging to the location and identification of the objects. It gives identification charts of PNe discovered since 1965 (published in the supplements to CGPN) and those charts of objects discovered earlier, which have wrong or uncertain identification. The question "what is a planetary nebula" is discussed and the typical values of PNe and of their central stars are summarized. Short statistics about the discoveries of PNe are given. The catalogue is also available in the Centre de Donnees, Strasbourg and at Hamburg Observatory via internet. (15 data files).
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models.

PubMed

Misra, Dharitri; Chen, Siyuan; Thoma, George R

2009-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models

PubMed Central

Misra, Dharitri; Chen, Siyuan; Thoma, George R.

2010-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
The Canadian Environmental Education Catalogue: A Guide to Selected Resources and Materials. Premier Edition.

ERIC Educational Resources Information Center

Heinrichs, Wally; And Others

Despite their large numbers, environmental education resources can be difficult to find. The purpose of this catalogue is to broaden the awareness of available resources among educators and curriculum developers and facilitate their accessibility. This first edition of the catalogue contains approximately 1,200 of the more than 4,000 titles that…
Matching radio catalogues with realistic geometry: application to SWIRE and ATLAS

NASA Astrophysics Data System (ADS)

Fan, Dongwei; Budavári, Tamás; Norris, Ray P.; Hopkins, Andrew M.

2015-08-01

Cross-matching catalogues at different wavelengths is a difficult problem in astronomy, especially when the objects are not point-like. At radio wavelengths, an object can have several components corresponding, for example, to a core and lobes. Considering not all radio detections correspond to visible or infrared sources, matching these catalogues can be challenging. Traditionally, this is done by eye for better quality, which does not scale to the large data volumes expected from the next-generation of radio telescopes. We present a novel automated procedure, using Bayesian hypothesis testing, to achieve reliable associations by explicit modelling of a particular class of radio-source morphology. The new algorithm not only assesses the likelihood of an association between data at two different wavelengths, but also tries to assess whether different radio sources are physically associated, are double-lobed radio galaxies, or just distinct nearby objects. Application to the Spitzer Wide-Area Infrared Extragalactic and Australia Telescope Large Area Survey CDF-S catalogues shows that this method performs well without human intervention.
The ASAS-SN bright supernova catalogue – III. 2016

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holoien, T. W. -S.; Brown, J. S.; Stanek, K. Z.

In this catalogue we summarize information for all supernovae discovered by the All-Sky Automated Survey for SuperNovae (ASAS-SN) and all other bright (m peak ≤ 17), spectroscopically confirmed supernovae discovered in 2016. We then gather the near-infrared through ultraviolet magnitudes of all host galaxies and the offsets of the supernovae from the centres of their hosts from public data bases. We illustrate the results using a sample that now totals 668 supernovae discovered since 2014 May 1, including the supernovae from our previous catalogues, with type distributions closely matching those of the ideal magnitude limited sample from Li et al.more » This is then the third of a series of yearly papers on bright supernovae and their hosts from the ASAS-SN team.« less
The ASAS-SN bright supernova catalogue – III. 2016

DOE PAGES

Holoien, T. W. -S.; Brown, J. S.; Stanek, K. Z.; ...

2017-08-18

In this catalogue we summarize information for all supernovae discovered by the All-Sky Automated Survey for SuperNovae (ASAS-SN) and all other bright (m peak ≤ 17), spectroscopically confirmed supernovae discovered in 2016. We then gather the near-infrared through ultraviolet magnitudes of all host galaxies and the offsets of the supernovae from the centres of their hosts from public data bases. We illustrate the results using a sample that now totals 668 supernovae discovered since 2014 May 1, including the supernovae from our previous catalogues, with type distributions closely matching those of the ideal magnitude limited sample from Li et al.more » This is then the third of a series of yearly papers on bright supernovae and their hosts from the ASAS-SN team.« less
A 2 epoch proper motion catalogue from the UKIDSS Large Area Survey

NASA Astrophysics Data System (ADS)

Smith, Leigh; Lucas, Phil; Burningham, Ben; Jones, Hugh; Pinfield, David; Smart, Ricky; Andrei, Alexandre

2013-04-01

The UKIDSS Large Area Survey (LAS) began in 2005, with the start of the UKIDSS program as a 7 year effort to survey roughly 4000 square degrees at high galactic latitudes in Y, J, H and K bands. The survey also included a significant quantity of 2-epoch J band observations, with epoch baselines ranging from 2 to 7 years. We present a proper motion catalogue for the 1500 square degrees of the 2 epoch LAS data, which includes some 800,000 sources with motions detected above the 5σ level. We developed a bespoke proper motion pipeline which applies a source-unique second order polynomial transformation to UKIDSS array coordinates of each source to counter potential local non-uniformity in the focal plane. Our catalogue agrees well with the proper motion data supplied in the current WFCAM Science Archive (WSA) DR9 catalogue where there is overlap, and in various optical catalogues, but it benefits from some improvements. One improvement is that we provide absolute proper motions, using LAS galaxies for the relative to absolute correction. Also, by using unique, local, 2nd order polynomial tranformations, as opposed to the linear transformations in the WSA, we correct better for any local distortions in the focal plane, not including the radial distortion that is removed by their pipeline.
DOIDB: Reusing DataCite's search software as metadata portal for GFZ Data Services

NASA Astrophysics Data System (ADS)

Elger, K.; Ulbricht, D.; Bertelmann, R.

2016-12-01

GFZ Data Services is the central service point for the publication of research data at the Helmholtz Centre Potsdam GFZ German Research Centre for Geosciences (GFZ). It provides data publishing services to scientists of GFZ, associated projects, and associated institutions. The publishing services aim to make research data and physical samples visible and citable, by assigning persistent identifiers (DOI, IGSN) and by complementing existing IT infrastructure. To integrate several research domains a modular software stack that is made of free software components has been created to manage data and metadata as well as register persistent identifiers [1]. Pivotal component for the registration of DOIs is the DOIDB. It has been derived from three software components provided by DataCite [2] that moderate the registration of DOIs and the deposition of metadata, allow the dissemination of metadata, and provide a user interface to navigate and discover datasets. The DOIDB acts as a proxy to the DataCite infrastructure and in addition to the DataCite metadata schema, it allows to deposit and disseminate metadata following the schemas ISO19139 and NASA GCMD DIF. The search component has been modified to meet the requirements of a geosciences metadata portal. In particular, the search component has been altered to make use of Apache SOLRs capability to index and query spatial coordinates. Furthermore, the user interface has been adjusted to provide a first impression of the data by showing a map, summary information and subjects. DOIDB and its components are available on GitHub [3].We present a software solution for registration of DOIs that allows to integrate existing data systems, keeps track of registered DOIs, and provides a metadata portal to discover datasets [4]. [1] Ulbricht, D.; Elger, K.; Bertelmann, R.; Klump, J. panMetaDocs, eSciDoc, and DOIDB—An Infrastructure for the Curation and Publication of File-Based Datasets for GFZ Data Services. ISPRS Int. J. Geo
An Overview of Tools for Creating, Validating and Using PDS Metadata

NASA Astrophysics Data System (ADS)

King, T. A.; Hardman, S. H.; Padams, J.; Mafi, J. N.; Cecconi, B.

2017-12-01

NASA's Planetary Data System (PDS) has defined information models for creating metadata to describe bundles, collections and products for all the assets acquired by a planetary science projects. Version 3 of the PDS Information Model (commonly known as "PDS3") is widely used and is used to describe most of the existing planetary archive. Recently PDS has released version 4 of the Information Model (commonly known as "PDS4") which is designed to improve consistency, efficiency and discoverability of information. To aid in creating, validating and using PDS4 metadata the PDS and a few associated groups have developed a variety of tools. In addition, some commercial tools, both free and for a fee, can be used to create and work with PDS4 metadata. We present an overview of these tools, describe those tools currently under development and provide guidance as to which tools may be most useful for missions, instrument teams and the individual researcher.
Partially populated catalogue of measured properties of field sections.

DOT National Transportation Integrated Search

2014-10-01

This catalogue documents the construction, monitoring, and mixture information of 11 test sections: four in SH 15 in the north Amarillo, three in US 62 in Childress, and four in Loop 820 in Fort Worth.
Halo substructure in the SDSS-Gaia catalogue: streams and clumps

NASA Astrophysics Data System (ADS)

Myeong, G. C.; Evans, N. W.; Belokurov, V.; Amorisco, N. C.; Koposov, S. E.

2018-04-01

We use the Sloan Digital Sky Survey (SDSS)-Gaia Catalogue to identify six new pieces of halo substructure. SDSS-Gaia is an astrometric catalogue that exploits SDSS data release 9 to provide first epoch photometry for objects in the Gaia source catalogue. We use a version of the catalogue containing 245 316 stars with all phase-space coordinates within a heliocentric distance of ˜10 kpc. We devise a method to assess the significance of halo substructures based on their clustering in velocity space. The two most substantial structures are multiple wraps of a stream which has undergone considerable phase mixing (S1, with 94 members) and a kinematically cold stream (S2, with 61 members). The member stars of S1 have a median position of (X, Y, Z) = (8.12, -0.22, 2.75) kpc and a median metallicity of [Fe/H] = -1.78. The stars of S2 have median coordinates (X, Y, Z) = (8.66, 0.30, 0.77) kpc and a median metallicity of [Fe/H] = -1.91. They lie in velocity space close to some of the stars in the stream reported by Helmi et al. By modelling, we estimate that both structures had progenitors with virial masses ≈1010M⊙ and infall times ≳ 9 Gyr ago. Using abundance matching, these correspond to stellar masses between 106 and 107M⊙. These are somewhat larger than the masses inferred through the mass-metallicity relation by factors of 5 to 15. Additionally, we identify two further substructures (S3 and S4 with 55 and 40 members) and two clusters or moving group (C1 and C2 with 24 and 12) members. In all six cases, clustering in kinematics is found to correspond to clustering in both configuration space and metallicity, adding credence to the reliability of our detections.
White dwarf-main sequence binaries from LAMOST: the DR5 catalogue

NASA Astrophysics Data System (ADS)

Ren, J.-J.; Rebassa-Mansergas, A.; Parsons, S. G.; Liu, X.-W.; Luo, A.-L.; Kong, X.; Zhang, H.-T.

2018-03-01

We present the data release (DR) 5 catalogue of white dwarf-main sequence (WDMS) binaries from the Large Area Multi-Object fiber Spectroscopic Telescope (LAMOST). The catalogue contains 876 WDMS binaries, of which 757 are additions to our previous LAMOST DR1 sample and 357 are systems that have not been published before. We also describe a LAMOST-dedicated survey that aims at obtaining spectra of photometrically-selected WDMS binaries from the Sloan Digital Sky Survey (SDSS) that are expected to contain cool white dwarfs and/or early type M dwarf companions. This is a population under-represented in previous SDSS WDMS binary catalogues. We determine the stellar parameters (white dwarf effective temperatures, surface gravities and masses, and M dwarf spectral types) of the LAMOST DR5 WDMS binaries and make use of the parameter distributions to analyse the properties of the sample. We find that, despite our efforts, systems containing cool white dwarfs remain under-represented. Moreover, we make use of LAMOST DR5 and SDSS DR14 (when available) spectra to measure the Na I λλ 8183.27, 8194.81 absorption doublet and/or Hα emission radial velocities of our systems. This allows identifying 128 binaries displaying significant radial velocity variations, 76 of which are new. Finally, we cross-match our catalogue with the Catalina Surveys and identify 57 systems displaying light curve variations. These include 16 eclipsing systems, two of which are new, and nine binaries that are new eclipsing candidates. We calculate periodograms from the photometric data and measure (estimate) the orbital periods of 30 (15) WDMS binaries.
Metadata mapping and reuse in caBIG™

PubMed Central

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-01-01

Background This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories. Results The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository. Conclusion This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies. PMID:19208192
Metadata mapping and reuse in caBIG.

PubMed

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-02-05

This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories. The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository. This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
linkedISA: semantic representation of ISA-Tab experimental metadata.

PubMed

González-Beltrán, Alejandra; Maguire, Eamonn; Sansone, Susanna-Assunta; Rocca-Serra, Philippe

2014-01-01

Reporting and sharing experimental metadata- such as the experimental design, characteristics of the samples, and procedures applied, along with the analysis results, in a standardised manner ensures that datasets are comprehensible and, in principle, reproducible, comparable and reusable. Furthermore, sharing datasets in formats designed for consumption by humans and machines will also maximize their use. The Investigation/Study/Assay (ISA) open source metadata tracking framework facilitates standards-compliant collection, curation, visualization, storage and sharing of datasets, leveraging on other platforms to enable analysis and publication. The ISA software suite includes several components used in increasingly diverse set of life science and biomedical domains; it is underpinned by a general-purpose format, ISA-Tab, and conversions exist into formats required by public repositories. While ISA-Tab works well mainly as a human readable format, we have also implemented a linked data approach to semantically define the ISA-Tab syntax. We present a semantic web representation of the ISA-Tab syntax that complements ISA-Tab's syntactic interoperability with semantic interoperability. We introduce the linkedISA conversion tool from ISA-Tab to the Resource Description Framework (RDF), supporting mappings from the ISA syntax to multiple community-defined, open ontologies and capitalising on user-provided ontology annotations in the experimental metadata. We describe insights of the implementation and how annotations can be expanded driven by the metadata. We applied the conversion tool as part of Bio-GraphIIn, a web-based application supporting integration of the semantically-rich experimental descriptions. Designed in a user-friendly manner, the Bio-GraphIIn interface hides most of the complexities to the users, exposing a familiar tabular view of the experimental description to allow seamless interaction with the RDF representation, and visualising descriptors to
Life+ EnvEurope DEIMS - improving access to long-term ecosystem monitoring data in Europe

NASA Astrophysics Data System (ADS)

Kliment, Tomas; Peterseil, Johannes; Oggioni, Alessandro; Pugnetti, Alessandra; Blankman, David

2013-04-01

resources. 2) Discovery client: provides several search profiles for datasets, persons, research sites and external resources commonly used in the domain, e.g. Catalogue of Life , based on several search patterns ranging from simple full text search, glossary browsing to categorized faceted search. 3) Geo-Viewer: a map client that portrays boundaries and centroids of the research sites as Web Map Service (WMS) layers. Each layer provides a link to both Metadata editor and Discovery client in order to create or discover metadata describing the data collected within the individual research site. Sharing of the dataset metadata with DEIMS is ensured in two ways: XML export of individual metadata records according to the EML schema for inclusion in the international DataOne network, and periodic harvesting of metadata into GeoNetwork catalogue, thus providing catalogue service for web (CSW), which can be invoked by remote clients. The final version of DEIMS will be a pilot implementation for the information system of LTER-Europe, which should establish a common information management framework within the European ecosystem research domain and provide valuable environmental information to other European information infrastructures as SEIS, Copernicus and INSPIRE.
Building a semantic web-based metadata repository for facilitating detailed clinical modeling in cancer genome studies.

PubMed

Sharma, Deepak K; Solbrig, Harold R; Tao, Cui; Weng, Chunhua; Chute, Christopher G; Jiang, Guoqian

2017-06-05

Detailed Clinical Models (DCMs) have been regarded as the basis for retaining computable meaning when data are exchanged between heterogeneous computer systems. To better support clinical cancer data capturing and reporting, there is an emerging need to develop informatics solutions for standards-based clinical models in cancer study domains. The objective of the study is to develop and evaluate a cancer genome study metadata management system that serves as a key infrastructure in supporting clinical information modeling in cancer genome study domains. We leveraged a Semantic Web-based metadata repository enhanced with both ISO11179 metadata standard and Clinical Information Modeling Initiative (CIMI) Reference Model. We used the common data elements (CDEs) defined in The Cancer Genome Atlas (TCGA) data dictionary, and extracted the metadata of the CDEs using the NCI Cancer Data Standards Repository (caDSR) CDE dataset rendered in the Resource Description Framework (RDF). The ITEM/ITEM_GROUP pattern defined in the latest CIMI Reference Model is used to represent reusable model elements (mini-Archetypes). We produced a metadata repository with 38 clinical cancer genome study domains, comprising a rich collection of mini-Archetype pattern instances. We performed a case study of the domain "clinical pharmaceutical" in the TCGA data dictionary and demonstrated enriched data elements in the metadata repository are very useful in support of building detailed clinical models. Our informatics approach leveraging Semantic Web technologies provides an effective way to build a CIMI-compliant metadata repository that would facilitate the detailed clinical modeling to support use cases beyond TCGA in clinical cancer study domains.
iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata

ERIC Educational Resources Information Center

Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen

2012-01-01

Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…
Investigation of the FK5 system in the equatorial zone. Application to the instrumental system of the Second Quito astrolabe catalogue.

NASA Astrophysics Data System (ADS)

Kolesnik, Y. B.

1995-12-01

15 catalogues produced in the 1980s and 12 catalogues made from 1960 to 1978 have been used to assess the consistency of the FK5 system with observations in the declination zone from -30deg to 30deg. Classical δ-dependent and α-dependent systematic differences (Cat-FK5) have been formed for individual instrumental systems of the catalogues. The weighted mean instrumental systems for two subsets of catalogues centred at the epochs 1970 and 1987 have been constructed. External systematic and random accuracy of the catalogues under analysis and errors of the mean instrumental systems for both selections of catalogues have been estimated and presented in tables. The individual systematic differences of the catalogues and the mean instrumental systems are shown in figures. Numerical values of the total systematic deviations for both mean instrumental systems are given in tables. The results of intercomparison are discussed to assess the actual systematic deviations of the FK5 at the respective epochs and its actual random accuracy. It has been found that the mutual consistency of individual instrumental systems of catalogues of 1980s with respect to zonal systematic differences in both right ascension and declination is significantly better when comparing with the earlier catalogues. Consistency of both catalogue subsets is comparable with respect to α-dependent systematic differences. It is shown that the claimed random errors of the FK5 positions and proper motions are rather realistic, while deviations of the FK5 right ascension and declination system in the equatorial zone for both mean epochs exceed expected ones from the formal considerations. Quick degradation of the FK5 system with time is detected in right ascension. The results in declination are recognized to be less reliable, due to larger inconsistency of the individual instrumental systems. The system of the Second Quito Astrolabe Catalogue (QAC 2) has been investigated by comparison with two subsets of

A Rich Metadata Filesystem for Scientific Data

ERIC Educational Resources Information Center

Bui, Hoang

2012-01-01

As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide both data archival services and rich metadata, and cleanly integrate with large scale computing resources. ROARS is a hybrid approach to distributed storage that provides…
Tracking Actual Usage: The Attention Metadata Approach

ERIC Educational Resources Information Center

Wolpers, Martin; Najjar, Jehad; Verbert, Katrien; Duval, Erik

2007-01-01

The information overload in learning and teaching scenarios is a main hindering factor for efficient and effective learning. New methods are needed to help teachers and students in dealing with the vast amount of available information and learning material. Our approach aims to utilize contextualized attention metadata to capture behavioural…
Metadata registry and management system based on ISO 11179 for cancer clinical trials information system

PubMed Central

Park, Yu Rang; Kim*, Ju Han

2006-01-01

Standardized management of data elements (DEs) for Case Report Form (CRF) is crucial in Clinical Trials Information System (CTIS). Traditional CTISs utilize organization-specific definitions and storage methods for Des and CRFs. We developed metadata-based DE management system for clinical trials, Clinical and Histopathological Metadata Registry (CHMR), using international standard for metadata registry (ISO 11179) for the management of cancer clinical trials information. CHMR was evaluated in cancer clinical trials with 1625 DEs extracted from the College of American Pathologists Cancer Protocols for 20 major cancers. PMID:17238675
The AKARI IRC asteroid flux catalogue: updated diameters and albedos

NASA Astrophysics Data System (ADS)

Alí-Lagoa, V.; Müller, T. G.; Usui, F.; Hasegawa, S.

2018-05-01

The AKARI IRC all-sky survey provided more than twenty thousand thermal infrared observations of over five thousand asteroids. Diameters and albedos were obtained by fitting an empirically calibrated version of the standard thermal model to these data. After the publication of the flux catalogue in October 2016, our aim here is to present the AKARI IRC all-sky survey data and discuss valuable scientific applications in the field of small body physical properties studies. As an example, we update the catalogue of asteroid diameters and albedos based on AKARI using the near-Earth asteroid thermal model (NEATM). We fit the NEATM to derive asteroid diameters and, whenever possible, infrared beaming parameters. We fit groups of observations taken for the same object at different epochs of the survey separately, so we compute more than one diameter for approximately half of the catalogue. We obtained a total of 8097 diameters and albedos for 5170 asteroids, and we fitted the beaming parameter for almost two thousand of them. When it was not possible to fit the beaming parameter, we used a straight line fit to our sample's beaming parameter-versus-phase angle plot to set the default value for each fit individually instead of using a single average value. Our diameters agree with stellar-occultation-based diameters well within the accuracy expected for the model. They also match the previous AKARI-based catalogue at phase angles lower than 50°, but we find a systematic deviation at higher phase angles, at which near-Earth and Mars-crossing asteroids were observed. The AKARI IRC All-sky survey is an essential source of information about asteroids, especially the large ones, since, it provides observations at different observation geometries, rotational coverages and aspect angles. For example, by comparing in more detail a few asteroids for which dimensions were derived from occultations, we discuss how the multiple observations per object may already provide three
A catalogue of the small transients observed in STEREO HI-A and their associated in-situ measurements

NASA Astrophysics Data System (ADS)

Sanchez-Diaz, Eduardo; Rouillard, Alexis P.; Davies, Jackie A.; Kilpua, Emilia; Plotnikov, Illya

2017-04-01

The systematic monitoring of the solar wind in high-cadence and high-resolution heliospheric images taken by the Solar-Terrestrial Relation Observatory (STEREO) spacecraft permits the study of the spatial and temporal evolution of variable solar wind flows from the Sun out to 1 AU, and beyond. As part of the EU Framework 7 (FP7) Heliospheric Cataloguing, Analysis and Techniques Service (HELCATS) project, Plotnikov et al. (2016) created a catalogue of 190 Stream Interaction Regions (SIRs) well-observed in images taken by the Heliospheric Imager (HI) instruments onboard STEREO-A (ST-A). This catalogue has been made available on line on the official HELCATS website (https://www.helcats-fp7.eu/catalogues/wp5_cat.html) and included in the propagation tool (http://propagationtool.cdpp.eu). Several transients, known as blobs, are observed entrained in each SIR. We complete this catalogue with the trajectory of individual blobs and with the latitudinal extent of the SIR. For every SIR we report whether the trajectory of any of the entrained blob impacts a spacecraft in the Heliosphere. For the cases where a blob is predicted to impact one or more spacecraft, we include in the catalogue the predicted arrival time and the date and time of the visually recognized blob which is the closest to the predicted arrival time. This new catalogue was also made available on line on the HELCATS project website. This work was made with the funding from the HELCATS project under the FP7 EU contract number 606692.
A global catalogue of Ceres impact craters ≥ 1 km and preliminary analysis

NASA Astrophysics Data System (ADS)

Gou, Sheng; Yue, Zongyu; Di, Kaichang; Liu, Zhaoqin

2018-03-01

The orbital data products of Ceres, including global LAMO image mosaic and global HAMO DTM with a resolution of 35 m/pixel and 135 m/pixel respectively, are utilized in this research to create a global catalogue of impact craters with diameter ≥ 1 km, and their morphometric parameters are calculated. Statistics shows: (1) There are 29,219 craters in the catalogue, and the craters have a various morphologies, e.g., polygonal crater, floor fractured crater, complex crater with central peak, etc.; (2) The identifiable smallest crater size is extended to 1 km and the crater numbers have been updated when compared with the crater catalogue (D ≥ 20 km) released by the Dawn Science Team; (3) The d/D ratios for fresh simple craters, obviously degraded simple crater and polygonal simple crater are 0.11 ± 0.04, 0.05 ± 0.04 and 0.14 ± 0.02 respectively. (4) The d/D ratios for non-polygonal complex crater and polygonal complex crater are 0.08 ± 0.04 and 0.09 ± 0.03. The global crater catalogue created in this work can be further applied to many other scientific researches, such as comparing d/D with other bodies, inferring subsurface properties, determining surface age, and estimating average erosion rate.
Evolution of Web Services in EOSDIS: Search and Order Metadata Registry (ECHO)

NASA Technical Reports Server (NTRS)

Mitchell, Andrew; Ramapriyan, Hampapuram; Lowe, Dawn

2009-01-01

During 2005 through 2008, NASA defined and implemented a major evolutionary change in it Earth Observing system Data and Information System (EOSDIS) to modernize its capabilities. This implementation was based on a vision for 2015 developed during 2005. The EOSDIS 2015 Vision emphasizes increased end-to-end data system efficiency and operability; increased data usability; improved support for end users; and decreased operations costs. One key feature of the Evolution plan was achieving higher operational maturity (ingest, reconciliation, search and order, performance, error handling) for the NASA s Earth Observing System Clearinghouse (ECHO). The ECHO system is an operational metadata registry through which the scientific community can easily discover and exchange NASA's Earth science data and services. ECHO contains metadata for 2,726 data collections comprising over 87 million individual data granules and 34 million browse images, consisting of NASA s EOSDIS Data Centers and the United States Geological Survey's Landsat Project holdings. ECHO is a middleware component based on a Service Oriented Architecture (SOA). The system is comprised of a set of infrastructure services that enable the fundamental SOA functions: publish, discover, and access Earth science resources. It also provides additional services such as user management, data access control, and order management. The ECHO system has a data registry and a services registry. The data registry enables organizations to publish EOS and other Earth-science related data holdings to a common metadata model. These holdings are described through metadata in terms of datasets (types of data) and granules (specific data items of those types). ECHO also supports browse images, which provide a visual representation of the data. The published metadata can be mapped to and from existing standards (e.g., FGDC, ISO 19115). With ECHO, users can find the metadata stored in the data registry and then access the data either
Asymmetric programming: a highly reliable metadata allocation strategy for MLC NAND flash memory-based sensor systems.

PubMed

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-10-10

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme.
Asymmetric Programming: A Highly Reliable Metadata Allocation Strategy for MLC NAND Flash Memory-Based Sensor Systems

PubMed Central

Huang, Min; Liu, Zhaoqing; Qiao, Liyan

2014-01-01

While the NAND flash memory is widely used as the storage medium in modern sensor systems, the aggressive shrinking of process geometry and an increase in the number of bits stored in each memory cell will inevitably degrade the reliability of NAND flash memory. In particular, it's critical to enhance metadata reliability, which occupies only a small portion of the storage space, but maintains the critical information of the file system and the address translations of the storage system. Metadata damage will cause the system to crash or a large amount of data to be lost. This paper presents Asymmetric Programming, a highly reliable metadata allocation strategy for MLC NAND flash memory storage systems. Our technique exploits for the first time the property of the multi-page architecture of MLC NAND flash memory to improve the reliability of metadata. The basic idea is to keep metadata in most significant bit (MSB) pages which are more reliable than least significant bit (LSB) pages. Thus, we can achieve relatively low bit error rates for metadata. Based on this idea, we propose two strategies to optimize address mapping and garbage collection. We have implemented Asymmetric Programming on a real hardware platform. The experimental results show that Asymmetric Programming can achieve a reduction in the number of page errors of up to 99.05% with the baseline error correction scheme. PMID:25310473
VizieR Online Data Catalog: Catalogue of HI maps of galaxies. I. (Martin 1998)

NASA Astrophysics Data System (ADS)

Martin, M. C.

1998-03-01

A catalogue is presented of galaxies having large-scale observations in the HI line. This catalogue collects from the literature the information that characterizes the observations in the 21-cm line and the way that these data were presented by means of maps, graphics and tables, for showing the distribution and kinematics of the gas. It contains furthermore a measure of the HI extension that is detected at the level of the maximum sensitivity reached in the observations. This catalogue is intended as a guide for references on the HI maps published in the literature from 1953 to 1995 and is the basis for the analysis of the data presented in Paper II (Cat. ). (4 data files).
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.

In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity. We began the effort with an assessment of open source data flow tools from the Hadoop ecosystem. We then began the construction of a layered architecture that is specifically designed to address many of the scalability and data quality issues we experience with our current pipeline. This included implementing basic functionality in each of the layers, such as establishing a data lake, designing a unifiedmore » metadata schema, tracking provenance, and calculating data quality metrics. Our original intent was to test and validate the new ingestion framework with data from a large-scale field deployment in a temporary network. This delivered somewhat unsatisfying results, since the new system immediately identified fatal flaws in the data relatively early in the pipeline. Although this is a correct result it did not allow us to sufficiently exercise the whole framework. We then widened our scope to process all available metadata from over a dozen online seismic data sources to further test the implementation and validate the design. This experiment also uncovered a higher than expected frequency of certain types of metadata issues that challenged us to further tune our data management strategy to handle them. Our result from this project is a greatly improved understanding of real world data issues, a validated design, and prototype implementations of major components of an eventual production framework. This successfully forms the basis of future development for the Geophysical Monitoring Program data pipeline, which is a critical asset supporting multiple programs. It also positions us very well to deliver valuable metadata management expertise to our sponsors, and has already resulted in an NNSA Office of Defense Nuclear
Nebula observations. Catalogues and archive of photoplates

NASA Astrophysics Data System (ADS)

Shlyapnikov, A. A.; Smirnova, M. A.; Elizarova, N. V.

2017-12-01

A process of data systematization based on "Academician G.A. Shajn's Plan" for studying the Galaxy structure related to nebula observations is considered. The creation of digital versions of catalogues of observations and publications is described, as well as their presentation in HTML, VOTable and AJS formats and basic principles of work in the interactive application of International Virtual Observatory the Aladin Sky Atlas.
White dwarf-main sequence binaries from LAMOST: the DR5 catalogue

NASA Astrophysics Data System (ADS)

Ren, J.-J.; Rebassa-Mansergas, A.; Parsons, S. G.; Liu, X.-W.; Luo, A.-L.; Kong, X.; Zhang, H.-T.

2018-07-01

We present the data release (DR) 5 catalogue of white dwarf-main sequence (WDMS) binaries from the Large sky Area Multi-Object fibre Spectroscopic Telescope (LAMOST). The catalogue contains 876 WDMS binaries, of which 757 are additions to our previous LAMOST DR1 sample and 357 are systems that have not been published before. We also describe a LAMOST-dedicated survey that aims at obtaining spectra of photometrically selected WDMS binaries from the Sloan Digital Sky Survey (SDSS) that are expected to contain cool white dwarfs and/or early-type M dwarf companions. This is a population under-represented in previous SDSS WDMS binary catalogues. We determine the stellar parameters (white dwarf effective temperatures, surface gravities and masses, and M dwarf spectral types) of the LAMOST DR5 WDMS binaries and make use of the parameter distributions to analyse the properties of the sample. We find that, despite our efforts, systems containing cool white dwarfs remain under-represented. Moreover, we make use of LAMOST DR5 and SDSS DR14 (when available) spectra to measure the Na I λλ 8183.27, 8194.81 absorption doublet and/or Hα emission radial velocities of our systems. This allows identifying 128 binaries displaying significant radial velocity variations, 76 of which are new. Finally, we cross-match our catalogue with the Catalina Surveys and identify 57 systems displaying light-curve variations. These include 16 eclipsing systems, two of which are new, and nine binaries that are new eclipsing candidates. We calculate periodograms from the photometric data and measure (estimate) the orbital periods of 30 (15) WDMS binaries.
Parallel file system with metadata distributed across partitioned key-value store c

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary; Torres, Aaron

2017-09-19

Improved techniques are provided for storing metadata associated with a plurality of sub-files associated with a single shared file in a parallel file system. The shared file is generated by a plurality of applications executing on a plurality of compute nodes. A compute node implements a Parallel Log Structured File System (PLFS) library to store at least one portion of the shared file generated by an application executing on the compute node and metadata for the at least one portion of the shared file on one or more object storage servers. The compute node is also configured to implement a partitioned data store for storing a partition of the metadata for the shared file, wherein the partitioned data store communicates with partitioned data stores on other compute nodes using a message passing interface. The partitioned data store can be implemented, for example, using Multidimensional Data Hashing Indexing Middleware (MDHIM).
Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.

2016-12-01

Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality
A Catalogue of Massive Young Stellar Objects

NASA Astrophysics Data System (ADS)

Chan, S. J.; Henning, Th.; Schreyer, K.

1994-12-01

We report on an ongoing project to compile a catalogue of massive young stellar objects (YSOs). Massive young stellar objects are compact and luminous infrared sources. The stellar core is still surrounded by optically thick dust shells (cf. Henning 1990, Fundamentals of Cosmic Physics, 14, 321). This catalogue, which contains about 250 objects, will provide comprehensive information such as infrared and radio flux densities, association with maser sources, and outflow phenomena. The objects were selected from the IRAS Point Source Catalogue based on the following criteria: (1) IRAS flux density qualities >= 2 in the 4 IRAS bands (12 microns, 25 microns, 60 microns and 100 microns). (2) Fnu(12microns) <= Fnu(25microns) <= Fnu(60microns) <= F_ν(100microns) Fnu(100microns) >= 1000 Jy (3) IRAS colors (including uncertainty 0.15) should be within the following color box: -0.15 >= R(12/25) >= 1.15, -0.15 >= R(25/60) >= 0.75, -0.35 >= R(60/100) >= 0.35, where R(i/j)=jF_nu (i)/iF_nu (j) (Henning et al. 1990, A&A, 227, 542) (4) IRAS idtype (type of objects)!= 1; objects are not associated with galaxies or late-type stars; ∣b∣ <= 10{(deg}) Our main goal is to collect the observational data of these sources as complete as possible. The flux densities from near-infrared to radio range are assembled (J, H, K bands, IRAS bands, 350 microns, 800 microns and 1.3 mm bands, 2 cm and 6 cm bands). The information on dust features (such as ice, silicate, PAH) comes from the IRAS Low Resolution Spectrometer Atlas and literature (cf. Volk & Cohen, 1989, AJ, 98, 931). The maser sources (H_2O, type I OH, CH_3OH) and NH_3, HCO(+) , and CS molecular line data towards these objects, which have been observed, are also reported. The CO outflow velocity will be given if the object is found to be associated with an outflow.
A Catalogue of Data in the Statistical Information Centre, March 1976. (Catalogue de donnees du Centre d'information statistique, Mars 1976.)

ERIC Educational Resources Information Center

Department of Indian Affairs and Northern Development, Ottawa (Ontario).

Over 189 materials which cover aspects of the Administration, Parks Canada, Indian and Eskimo Affairs, and Northern Development Programs are cited in this bilingual catalogue (English and French). Information given for each entry is: reference number, statistics available, years covered, and whether the statistics are available by area, region,…
Revision of IRIS/IDA Seismic Station Metadata

NASA Astrophysics Data System (ADS)

Xu, W.; Davis, P.; Auerbach, D.; Klimczak, E.

2017-12-01

Trustworthy data quality assurance has always been one of the goals of seismic network operators and data management centers. This task is considerably complex and evolving due to the huge quantities as well as the rapidly changing characteristics and complexities of seismic data. Published metadata usually reflect instrument response characteristics and their accuracies, which includes zero frequency sensitivity for both seismometer and data logger as well as other, frequency-dependent elements. In this work, we are mainly focused studying the variation of the seismometer sensitivity with time of IRIS/IDA seismic recording systems with a goal to improve the metadata accuracy for the history of the network. There are several ways to measure the accuracy of seismometer sensitivity for the seismic stations in service. An effective practice recently developed is to collocate a reference seismometer in proximity to verify the in-situ sensors' calibration. For those stations with a secondary broadband seismometer, IRIS' MUSTANG metric computation system introduced a transfer function metric to reflect two sensors' gain ratios in the microseism frequency band. In addition, a simulation approach based on M2 tidal measurements has been proposed and proven to be effective. In this work, we compare and analyze the results from three different methods, and concluded that the collocated-sensor method is most stable and reliable with the minimum uncertainties all the time. However, for epochs without both the collocated sensor and secondary seismometer, we rely on the analysis results from tide method. For the data since 1992 on IDA stations, we computed over 600 revised seismometer sensitivities for all the IRIS/IDA network calibration epochs. Hopefully further revision procedures will help to guarantee that the data is accurately reflected by the metadata of these stations.
Metadata and network API aspects of a framework for storing and retrieving civil infrastructure monitoring data

NASA Astrophysics Data System (ADS)

Wong, John-Michael; Stojadinovic, Bozidar

2005-05-01

A framework has been defined for storing and retrieving civil infrastructure monitoring data over a network. The framework consists of two primary components: metadata and network communications. The metadata component provides the descriptions and data definitions necessary for cataloging and searching monitoring data. The communications component provides Java classes for remotely accessing the data. Packages of Enterprise JavaBeans and data handling utility classes are written to use the underlying metadata information to build real-time monitoring applications. The utility of the framework was evaluated using wireless accelerometers on a shaking table earthquake simulation test of a reinforced concrete bridge column. The NEESgrid data and metadata repository services were used as a backend storage implementation. A web interface was created to demonstrate the utility of the data model and provides an example health monitoring application.
Automatic publishing ISO 19115 metadata with PanMetaDocs using SensorML information

NASA Astrophysics Data System (ADS)

Stender, Vivien; Ulbricht, Damian; Schroeder, Matthias; Klump, Jens

2014-05-01

Terrestrial Environmental Observatories (TERENO) is an interdisciplinary and long-term research project spanning an Earth observation network across Germany. It includes four test sites within Germany from the North German lowlands to the Bavarian Alps and is operated by six research centers of the Helmholtz Association. The contribution by the participating research centers is organized as regional observatories. A challenge for TERENO and its observatories is to integrate all aspects of data management, data workflows, data modeling and visualizations into the design of a monitoring infrastructure. TERENO Northeast is one of the sub-observatories of TERENO and is operated by the German Research Centre for Geosciences (GFZ) in Potsdam. This observatory investigates geoecological processes in the northeastern lowland of Germany by collecting large amounts of environmentally relevant data. The success of long-term projects like TERENO depends on well-organized data management, data exchange between the partners involved and on the availability of the captured data. Data discovery and dissemination are facilitated not only through data portals of the regional TERENO observatories but also through a common spatial data infrastructure TEODOOR (TEreno Online Data repOsitORry). TEODOOR bundles the data, provided by the different web services of the single observatories, and provides tools for data discovery, visualization and data access. The TERENO Northeast data infrastructure integrates data from more than 200 instruments and makes data available through standard web services. Geographic sensor information and services are described using the ISO 19115 metadata schema. TEODOOR accesses the OGC Sensor Web Enablement (SWE) interfaces offered by the regional observatories. In addition to the SWE interface, TERENO Northeast also published data through DataCite. The necessary metadata are created in an automated process by extracting information from the SWE SensorML to

Some links on this page may take you to non-federal websites. Their policies may differ from this site.