automatic adaptive metadata: Topics by Science.gov

Sample records for automatic adaptive metadata

A Prototype Publishing Registry for the Virtual Observatory

NASA Astrophysics Data System (ADS)

Williamson, R.; Plante, R.

2004-07-01

In the Virtual Observatory (VO), a registry helps users locate resources, such as data and services, in a distributed environment. A general framework for VO registries is now under development within the International Virtual Observatory Alliance (IVOA) Registry Working Group. We present a prototype of one component of this framework: the publishing registry. The publishing registry allows data providers to expose metadata descriptions of their resources to the VO environment. Searchable registries can harvest the metadata from many publishing registries and make them searchable by users. We have developed a prototype publishing registry that data providers can install at their sites to publish their resources. The descriptions are exposed using the Open Archive Initiative (OAI) Protocol for Metadata Harvesting. Automating the input of metadata into registries is critical when a provider wishes to describe many resources. We illustrate various strategies for such automation, both currently in use and planned for the future. We also describe how future versions of the registry can adapt automatically to evolving metadata schemas for describing resources.
Design of Community Resource Inventories as a Component of Scalable Earth Science Infrastructure: Experience of the Earthcube CINERGI Project

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Valentine, D. W., Jr.; Grethe, J. S.; Hsu, L.; Malik, T.; Bermudez, L. E.; Gupta, A.; Lehnert, K. A.; Whitenack, T.; Ozyurt, I. B.; Condit, C.; Calderon, R.; Musil, L.

2014-12-01

EarthCube is envisioned as a cyberinfrastructure that fosters new, transformational geoscience by enabling sharing, understanding and scientifically-sound and efficient re-use of formerly unconnected data resources, software, models, repositories, and computational power. Its purpose is to enable science enterprise and workforce development via an extensible and adaptable collaboration and resource integration framework. A key component of this vision is development of comprehensive inventories supporting resource discovery and re-use across geoscience domains. The goal of the EarthCube CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) project is to create a methodology and assemble a large inventory of high-quality information resources with standard metadata descriptions and traceable provenance. The inventory is compiled from metadata catalogs maintained by geoscience data facilities, as well as from user contributions. The latter mechanism relies on community resource viewers: online applications that support update and curation of metadata records. Once harvested into CINERGI, metadata records from domain catalogs and community resource viewers are loaded into a staging database implemented in MongoDB, and validated for compliance with ISO 19139 metadata schema. Several types of metadata defects detected by the validation engine are automatically corrected with help of several information extractors or flagged for manual curation. The metadata harvesting, validation and processing components generate provenance statements using W3C PROV notation, which are stored in a Neo4J database. Thus curated metadata, along with the provenance information, is re-published and accessed programmatically and via a CINERGI online application. This presentation focuses on the role of resource inventories in a scalable and adaptable information infrastructure, and on the CINERGI metadata pipeline and its implementation challenges. Key project components are described at the project's website (http://workspace.earthcube.org/cinergi), which also provides access to the initial resource inventory, the inventory metadata model, metadata entry forms and a collection of the community resource viewers.
Supporting Semantic Annotation of Educational Content by Automatic Extraction of Hierarchical Domain Relationships

ERIC Educational Resources Information Center

Vrablecová, Petra; Šimko, Marián

2016-01-01

The domain model is an essential part of an adaptive learning system. For each educational course, it involves educational content and semantics, which is also viewed as a form of conceptual metadata about educational content. Due to the size of a domain model, manual domain model creation is a challenging and demanding task for teachers or…
What Information Does Your EHR Contain? Automatic Generation of a Clinical Metadata Warehouse (CMDW) to Support Identification and Data Access Within Distributed Clinical Research Networks.

PubMed

Bruland, Philipp; Doods, Justin; Storck, Michael; Dugas, Martin

2017-01-01

Data dictionaries provide structural meta-information about data definitions in health information technology (HIT) systems. In this regard, reusing healthcare data for secondary purposes offers several advantages (e.g. reduce documentation times or increased data quality). Prerequisites for data reuse are its quality, availability and identical meaning of data. In diverse projects, research data warehouses serve as core components between heterogeneous clinical databases and various research applications. Given the complexity (high number of data elements) and dynamics (regular updates) of electronic health record (EHR) data structures, we propose a clinical metadata warehouse (CMDW) based on a metadata registry standard. Metadata of two large hospitals were automatically inserted into two CMDWs containing 16,230 forms and 310,519 data elements. Automatic updates of metadata are possible as well as semantic annotations. A CMDW allows metadata discovery, data quality assessment and similarity analyses. Common data models for distributed research networks can be established based on similarity analyses.
Automatic Extraction of Metadata from Scientific Publications for CRIS Systems

ERIC Educational Resources Information Center

Kovacevic, Aleksandar; Ivanovic, Dragan; Milosavljevic, Branko; Konjovic, Zora; Surla, Dusan

2011-01-01

Purpose: The aim of this paper is to develop a system for automatic extraction of metadata from scientific papers in PDF format for the information system for monitoring the scientific research activity of the University of Novi Sad (CRIS UNS). Design/methodology/approach: The system is based on machine learning and performs automatic extraction…
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras.

PubMed

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-06-24

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system.
Normalized Metadata Generation for Human Retrieval Using Multiple Video Surveillance Cameras

PubMed Central

Jung, Jaehoon; Yoon, Inhye; Lee, Seungwon; Paik, Joonki

2016-01-01

Since it is impossible for surveillance personnel to keep monitoring videos from a multiple camera-based surveillance system, an efficient technique is needed to help recognize important situations by retrieving the metadata of an object-of-interest. In a multiple camera-based surveillance system, an object detected in a camera has a different shape in another camera, which is a critical issue of wide-range, real-time surveillance systems. In order to address the problem, this paper presents an object retrieval method by extracting the normalized metadata of an object-of-interest from multiple, heterogeneous cameras. The proposed metadata generation algorithm consists of three steps: (i) generation of a three-dimensional (3D) human model; (ii) human object-based automatic scene calibration; and (iii) metadata generation. More specifically, an appropriately-generated 3D human model provides the foot-to-head direction information that is used as the input of the automatic calibration of each camera. The normalized object information is used to retrieve an object-of-interest in a wide-range, multiple-camera surveillance system in the form of metadata. Experimental results show that the 3D human model matches the ground truth, and automatic calibration-based normalization of metadata enables a successful retrieval and tracking of a human object in the multiple-camera video surveillance system. PMID:27347961
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal.

PubMed

Baker, Ed

2013-01-01

Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping.
EXIF Custom: Automatic image metadata extraction for Scratchpads and Drupal

PubMed Central

2013-01-01

Abstract Many institutions and individuals use embedded metadata to aid in the management of their image collections. Many deskop image management solutions such as Adobe Bridge and online tools such as Flickr also make use of embedded metadata to describe, categorise and license images. Until now Scratchpads (a data management system and virtual research environment for biodiversity) have not made use of these metadata, and users have had to manually re-enter this information if they have wanted to display it on their Scratchpad site. The Drupal described here allows users to map metadata embedded in their images to the associated field in the Scratchpads image form using one or more customised mappings. The module works seamlessly with the bulk image uploader used on Scratchpads and it is therefore possible to upload hundreds of images easily with automatic metadata (EXIF, XMP and IPTC) extraction and mapping. PMID:24723768
Automated Transformation of CDISC ODM to OpenClinica.

PubMed

Gessner, Sophia; Storck, Michael; Hegselmann, Stefan; Dugas, Martin; Soto-Rey, Iñaki

2017-01-01

Due to the increasing use of electronic data capture systems for clinical research, the interest in saving resources by automatically generating and reusing case report forms in clinical studies is growing. OpenClinica, an open-source electronic data capture system enables the reuse of metadata in its own Excel import template, hampering the reuse of metadata defined in other standard formats. One of these standard formats is the Operational Data Model for metadata, administrative and clinical data in clinical studies. This work suggests a mapping from Operational Data Model to OpenClinica and describes the implementation of a converter to automatically generate OpenClinica conform case report forms based upon metadata in the Operational Data Model.
Policy enabled information sharing system

DOEpatents

Jorgensen, Craig R.; Nelson, Brian D.; Ratheal, Steve W.

2014-09-02

A technique for dynamically sharing information includes executing a sharing policy indicating when to share a data object responsive to the occurrence of an event. The data object is created by formatting a data file to be shared with a receiving entity. The data object includes a file data portion and a sharing metadata portion. The data object is encrypted and then automatically transmitted to the receiving entity upon occurrence of the event. The sharing metadata portion includes metadata characterizing the data file and referenced in connection with the sharing policy to determine when to automatically transmit the data object to the receiving entity.
PS1-41: Just Add Data: Implementing an Event-Based Data Model for Clinical Trial Tracking

PubMed Central

Fuller, Sharon; Carrell, David; Pardee, Roy

2012-01-01

Background/Aims Clinical research trials often have similar fundamental tracking needs, despite being quite variable in their specific logic and activities. A model tracking database that can be quickly adapted by a variety of studies has the potential to achieve significant efficiencies in database development and maintenance. Methods Over the course of several different clinical trials, we have developed a database model that is highly adaptable to a variety of projects. Rather than hard-coding each specific event that might occur in a trial, along with its logical consequences, this model considers each event and its parameters to be a data record in its own right. Each event may have related variables (metadata) describing its prerequisites, subsequent events due, associated mailings, or events that it overrides. The metadata for each event is stored in the same record with the event name. When changes are made to the study protocol, no structural changes to the database are needed. One has only to add or edit events and their metadata. Changes in the event metadata automatically determine any related logic changes. In addition to streamlining application code, this model simplifies communication between the programmer and other team members. Database requirements can be phrased as changes to the underlying data, rather than to the application code. The project team can review a single report of events and metadata and easily see where changes might be needed. In addition to benefitting from streamlined code, the front end database application can also implement useful standard features such as automated mail merges and to do lists. Results The event-based data model has proven itself to be robust, adaptable and user-friendly in a variety of study contexts. We have chosen to implement it as a SQL Server back end and distributed Access front end. Interested readers may request a copy of the Access front end and scripts for creating the back end database. Discussion An event-based database with a consistent, robust set of features has the potential to significantly reduce development time and maintenance expense for clinical trial tracking databases.
iLOG: A Framework for Automatic Annotation of Learning Objects with Empirical Usage Metadata

ERIC Educational Resources Information Center

Miller, L. D.; Soh, Leen-Kiat; Samal, Ashok; Nugent, Gwen

2012-01-01

Learning objects (LOs) are digital or non-digital entities used for learning, education or training commonly stored in repositories searchable by their associated metadata. Unfortunately, based on the current standards, such metadata is often missing or incorrectly entered making search difficult or impossible. In this paper, we investigate…
Utilizing Linked Open Data Sources for Automatic Generation of Semantic Metadata

NASA Astrophysics Data System (ADS)

Nummiaho, Antti; Vainikainen, Sari; Melin, Magnus

In this paper we present an application that can be used to automatically generate semantic metadata for tags given as simple keywords. The application that we have implemented in Java programming language creates the semantic metadata by linking the tags to concepts in different semantic knowledge bases (CrunchBase, DBpedia, Freebase, KOKO, Opencyc, Umbel and/or WordNet). The steps that our application takes in doing so include detecting possible languages, finding spelling suggestions and finding meanings from amongst the proper nouns and common nouns separately. Currently, our application supports English, Finnish and Swedish words, but other languages could be included easily if the required lexical tools (spellcheckers, etc.) are available. The created semantic metadata can be of great use in, e.g., finding and combining similar contents, creating recommendations and targeting advertisements.
A metadata-driven approach to data repository design.

PubMed

Harvey, Matthew J; McLean, Andrew; Rzepa, Henry S

2017-01-01

The design and use of a metadata-driven data repository for research data management is described. Metadata is collected automatically during the submission process whenever possible and is registered with DataCite in accordance with their current metadata schema, in exchange for a persistent digital object identifier. Two examples of data preview are illustrated, including the demonstration of a method for integration with commercial software that confers rich domain-specific data analytics without introducing customisation into the repository itself.
The STP (Solar-Terrestrial Physics) Semantic Web based on the RSS1.0 and the RDF

NASA Astrophysics Data System (ADS)

Kubo, T.; Murata, K. T.; Kimura, E.; Ishikura, S.; Shinohara, I.; Kasaba, Y.; Watari, S.; Matsuoka, D.

2006-12-01

In the Solar-Terrestrial Physics (STP), it is pointed out that circulation and utilization of observation data among researchers are insufficient. To archive interdisciplinary researches, we need to overcome this circulation and utilization problems. Under such a background, authors' group has developed a world-wide database that manages meta-data of satellite and ground-based observation data files. It is noted that retrieving meta-data from the observation data and registering them to database have been carried out by hand so far. Our goal is to establish the STP Semantic Web. The Semantic Web provides a common framework that allows a variety of data shared and reused across applications, enterprises, and communities. We also expect that the secondary information related with observations, such as event information and associated news, are also shared over the networks. The most fundamental issue on the establishment is who generates, manages and provides meta-data in the Semantic Web. We developed an automatic meta-data collection system for the observation data using the RSS (RDF Site Summary) 1.0. The RSS1.0 is one of the XML-based markup languages based on the RDF (Resource Description Framework), which is designed for syndicating news and contents of news-like sites. The RSS1.0 is used to describe the STP meta-data, such as data file name, file server address and observation date. To describe the meta-data of the STP beyond RSS1.0 vocabulary, we defined original vocabularies for the STP resources using the RDF Schema. The RDF describes technical terms on the STP along with the Dublin Core Metadata Element Set, which is standard for cross-domain information resource descriptions. Researchers' information on the STP by FOAF, which is known as an RDF/XML vocabulary, creates a machine-readable metadata describing people. Using the RSS1.0 as a meta-data distribution method, the workflow from retrieving meta-data to registering them into the database is automated. This technique is applied for several database systems, such as the DARTS database system and NICT Space Weather Report Service. The DARTS is a science database managed by ISAS/JAXA in Japan. We succeeded in generating and collecting the meta-data automatically for the CDF (Common data Format) data, such as Reimei satellite data, provided by the DARTS. We also create an RDF service for space weather report and real-time global MHD simulation 3D data provided by the NICT. Our Semantic Web system works as follows: The RSS1.0 documents generated on the data sites (ISAS and NICT) are automatically collected by a meta-data collection agent. The RDF documents are registered and the agent extracts meta-data to store them in the Sesame, which is an open source RDF database with support for RDF Schema inferencing and querying. The RDF database provides advanced retrieval processing that has considered property and relation. Finally, the STP Semantic Web provides automatic processing or high level search for the data which are not only for observation data but for space weather news, physical events, technical terms and researches information related to the STP.
Streamlining Metadata and Data Management for Evolving Digital Libraries

NASA Astrophysics Data System (ADS)

Clark, D.; Miller, S. P.; Peckman, U.; Smith, J.; Aerni, S.; Helly, J.; Sutton, D.; Chase, A.

2003-12-01

What began two years ago as an effort to stabilize the Scripps Institution of Oceanography (SIO) data archives from more than 700 cruises going back 50 years, has now become the operational fully-searchable "SIOExplorer" digital library, complete with thousands of historic photographs, images, maps, full text documents, binary data files, and 3D visualization experiences, totaling nearly 2 terabytes of digital content. Coping with data diversity and complexity has proven to be more challenging than dealing with large volumes of digital data. SIOExplorer has been built with scalability in mind, so that the addition of new data types and entire new collections may be accomplished with ease. It is a federated system, currently interoperating with three independent data-publishing authorities, each responsible for their own quality control, metadata specifications, and content selection. The IT architecture implemented at the San Diego Supercomputer Center (SDSC) streamlines the integration of additional projects in other disciplines with a suite of metadata management and collection building tools for "arbitrary digital objects." Metadata are automatically harvested from data files into domain-specific metadata blocks, and mapped into various specification standards as needed. Metadata can be browsed and objects can be viewed onscreen or downloaded for further analysis, with automatic proprietary-hold request management.
Automated software system for checking the structure and format of ACM SIG documents

NASA Astrophysics Data System (ADS)

Mirza, Arsalan Rahman; Sah, Melike

2017-04-01

Microsoft (MS) Office Word is one of the most commonly used software tools for creating documents. MS Word 2007 and above uses XML to represent the structure of MS Word documents. Metadata about the documents are automatically created using Office Open XML (OOXML) syntax. We develop a new framework, which is called ADFCS (Automated Document Format Checking System) that takes the advantage of the OOXML metadata, in order to extract semantic information from MS Office Word documents. In particular, we develop a new ontology for Association for Computing Machinery (ACM) Special Interested Group (SIG) documents for representing the structure and format of these documents by using OWL (Web Ontology Language). Then, the metadata is extracted automatically in RDF (Resource Description Framework) according to this ontology using the developed software. Finally, we generate extensive rules in order to infer whether the documents are formatted according to ACM SIG standards. This paper, introduces ACM SIG ontology, metadata extraction process, inference engine, ADFCS online user interface, system evaluation and user study evaluations.
CMO: Cruise Metadata Organizer for JAMSTEC Research Cruises

NASA Astrophysics Data System (ADS)

Fukuda, K.; Saito, H.; Hanafusa, Y.; Vanroosebeke, A.; Kitayama, T.

2011-12-01

JAMSTEC's Data Research Center for Marine-Earth Sciences manages and distributes a wide variety of observational data and samples obtained from JAMSTEC research vessels and deep sea submersibles. Generally, metadata are essential to identify data and samples were obtained. In JAMSTEC, cruise metadata include cruise information such as cruise ID, name of vessel, research theme, and diving information such as dive number, name of submersible and position of diving point. They are submitted by chief scientists of research cruises in the Microsoft Excel° spreadsheet format, and registered into a data management database to confirm receipt of observational data files, cruise summaries, and cruise reports. The cruise metadata are also published via "JAMSTEC Data Site for Research Cruises" within two months after end of cruise. Furthermore, these metadata are distributed with observational data, images and samples via several data and sample distribution websites after a publication moratorium period. However, there are two operational issues in the metadata publishing process. One is that duplication efforts and asynchronous metadata across multiple distribution websites due to manual metadata entry into individual websites by administrators. The other is that differential data types or representation of metadata in each website. To solve those problems, we have developed a cruise metadata organizer (CMO) which allows cruise metadata to be connected from the data management database to several distribution websites. CMO is comprised of three components: an Extensible Markup Language (XML) database, an Enterprise Application Integration (EAI) software, and a web-based interface. The XML database is used because of its flexibility for any change of metadata. Daily differential uptake of metadata from the data management database to the XML database is automatically processed via the EAI software. Some metadata are entered into the XML database using the web-based interface by a metadata editor in CMO as needed. Then daily differential uptake of metadata from the XML database to databases in several distribution websites is automatically processed using a convertor defined by the EAI software. Currently, CMO is available for three distribution websites: "Deep Sea Floor Rock Sample Database GANSEKI", "Marine Biological Sample Database", and "JAMSTEC E-library of Deep-sea Images". CMO is planned to provide "JAMSTEC Data Site for Research Cruises" with metadata in the future.
Incorporating clinical metadata with digital image features for automated identification of cutaneous melanoma.

PubMed

Liu, Z; Sun, J; Smith, M; Smith, L; Warr, R

2013-11-01

Computer-assisted diagnosis (CAD) of malignant melanoma (MM) has been advocated to help clinicians to achieve a more objective and reliable assessment. However, conventional CAD systems examine only the features extracted from digital photographs of lesions. Failure to incorporate patients' personal information constrains the applicability in clinical settings. To develop a new CAD system to improve the performance of automatic diagnosis of melanoma, which, for the first time, incorporates digital features of lesions with important patient metadata into a learning process. Thirty-two features were extracted from digital photographs to characterize skin lesions. Patients' personal information, such as age, gender and, lesion site, and their combinations, was quantified as metadata. The integration of digital features and metadata was realized through an extended Laplacian eigenmap, a dimensionality-reduction method grouping lesions with similar digital features and metadata into the same classes. The diagnosis reached 82.1% sensitivity and 86.1% specificity when only multidimensional digital features were used, but improved to 95.2% sensitivity and 91.0% specificity after metadata were incorporated appropriately. The proposed system achieves a level of sensitivity comparable with experienced dermatologists aided by conventional dermoscopes. This demonstrates the potential of our method for assisting clinicians in diagnosing melanoma, and the benefit it could provide to patients and hospitals by greatly reducing unnecessary excisions of benign naevi. This paper proposes an enhanced CAD system incorporating clinical metadata into the learning process for automatic classification of melanoma. Results demonstrate that the additional metadata and the mechanism to incorporate them are useful for improving CAD of melanoma. © 2013 British Association of Dermatologists.

Adapt

NASA Astrophysics Data System (ADS)

Bargatze, L. F.

2015-12-01

Active Data Archive Product Tracking (ADAPT) is a collection of software routines that permits one to generate XML metadata files to describe and register data products in support of the NASA Heliophysics Virtual Observatory VxO effort. ADAPT is also a philosophy. The ADAPT concept is to use any and all available metadata associated with scientific data to produce XML metadata descriptions in a consistent, uniform, and organized fashion to provide blanket access to the full complement of data stored on a targeted data server. In this poster, we present an application of ADAPT to describe all of the data products that are stored by using the Common Data File (CDF) format served out by the CDAWEB and SPDF data servers hosted at the NASA Goddard Space Flight Center. These data servers are the primary repositories for NASA Heliophysics data. For this purpose, the ADAPT routines have been used to generate data resource descriptions by using an XML schema named Space Physics Archive, Search, and Extract (SPASE). SPASE is the designated standard for documenting Heliophysics data products, as adopted by the Heliophysics Data and Model Consortium. The set of SPASE XML resource descriptions produced by ADAPT includes high-level descriptions of numerical data products, display data products, or catalogs and also includes low-level "Granule" descriptions. A SPASE Granule is effectively a universal access metadata resource; a Granule associates an individual data file (e.g. a CDF file) with a "parent" high-level data resource description, assigns a resource identifier to the file, and lists the corresponding assess URL(s). The CDAWEB and SPDF file systems were queried to provide the input required by the ADAPT software to create an initial set of SPASE metadata resource descriptions. Then, the CDAWEB and SPDF data repositories were queried subsequently on a nightly basis and the CDF file lists were checked for any changes such as the occurrence of new, modified, or deleted files, or the addition of new or the deletion of old data products. Next, ADAPT routines analyzed the query results and issued updates to the metadata stored in the UCLA CDAWEB and SPDF metadata registries. In this way, the SPASE metadata registries generated by ADAPT can be relied on to provide up to date and complete access to Heliophysics CDF data resources on a daily basis.
Semantic Similarity Graphs of Mathematics Word Problems: Can Terminology Detection Help?

ERIC Educational Resources Information Center

John, Rogers Jeffrey Leo; Passonneau, Rebecca J.; McTavish, Thomas S.

2015-01-01

Curricula often lack metadata to characterize the relatedness of concepts. To investigate automatic methods for generating relatedness metadata for a mathematics curriculum, we first address the task of identifying which terms in the vocabulary from mathematics word problems are associated with the curriculum. High chance-adjusted interannotator…
Automatic Content Recommendation and Aggregation According to SCORM

ERIC Educational Resources Information Center

Neves, Daniel Eugênio; Brandão, Wladmir Cardoso; Ishitani, Lucila

2017-01-01

Although widely used, the SCORM metadata model for content aggregation is difficult to be used by educators, content developers and instructional designers. Particularly, the identification of contents related with each other, in large repositories, and their aggregation using metadata as defined in SCORM, has been demanding efforts of computer…
Assessing Public Metabolomics Metadata, Towards Improving Quality.

PubMed

Ferreira, João D; Inácio, Bruno; Salek, Reza M; Couto, Francisco M

2017-12-13

Public resources need to be appropriately annotated with metadata in order to make them discoverable, reproducible and traceable, further enabling them to be interoperable or integrated with other datasets. While data-sharing policies exist to promote the annotation process by data owners, these guidelines are still largely ignored. In this manuscript, we analyse automatic measures of metadata quality, and suggest their application as a mean to encourage data owners to increase the metadata quality of their resources and submissions, thereby contributing to higher quality data, improved data sharing, and the overall accountability of scientific publications. We analyse these metadata quality measures in the context of a real-world repository of metabolomics data (i.e. MetaboLights), including a manual validation of the measures, and an analysis of their evolution over time. Our findings suggest that the proposed measures can be used to mimic a manual assessment of metadata quality.
HDF-EOS Web Server

NASA Technical Reports Server (NTRS)

Ullman, Richard; Bane, Bob; Yang, Jingli

2008-01-01

A shell script has been written as a means of automatically making HDF-EOS-formatted data sets available via the World Wide Web. ("HDF-EOS" and variants thereof are defined in the first of the two immediately preceding articles.) The shell script chains together some software tools developed by the Data Usability Group at Goddard Space Flight Center to perform the following actions: Extract metadata in Object Definition Language (ODL) from an HDF-EOS file, Convert the metadata from ODL to Extensible Markup Language (XML), Reformat the XML metadata into human-readable Hypertext Markup Language (HTML), Publish the HTML metadata and the original HDF-EOS file to a Web server and an Open-source Project for a Network Data Access Protocol (OPeN-DAP) server computer, and Reformat the XML metadata and submit the resulting file to the EOS Clearinghouse, which is a Web-based metadata clearinghouse that facilitates searching for, and exchange of, Earth-Science data.
Simplifying the Reuse and Interoperability of Geoscience Data Sets and Models with Semantic Metadata that is Human-Readable and Machine-actionable

NASA Astrophysics Data System (ADS)

Peckham, S. D.

2017-12-01

Standardized, deep descriptions of digital resources (e.g. data sets, computational models, software tools and publications) make it possible to develop user-friendly software systems that assist scientists with the discovery and appropriate use of these resources. Semantic metadata makes it possible for machines to take actions on behalf of humans, such as automatically identifying the resources needed to solve a given problem, retrieving them and then automatically connecting them (despite their heterogeneity) into a functioning workflow. Standardized model metadata also helps model users to understand the important details that underpin computational models and to compare the capabilities of different models. These details include simplifying assumptions on the physics, governing equations and the numerical methods used to solve them, discretization of space (the grid) and time (the time-stepping scheme), state variables (input or output), model configuration parameters. This kind of metadata provides a "deep description" of a computational model that goes well beyond other types of metadata (e.g. author, purpose, scientific domain, programming language, digital rights, provenance, execution) and captures the science that underpins a model. A carefully constructed, unambiguous and rules-based schema to address this problem, called the Geoscience Standard Names ontology will be presented that utilizes Semantic Web best practices and technologies. It has also been designed to work across science domains and to be readable by both humans and machines.
Passenger baggage object database (PBOD)

NASA Astrophysics Data System (ADS)

Gittinger, Jaxon M.; Suknot, April N.; Jimenez, Edward S.; Spaulding, Terry W.; Wenrich, Steve A.

2018-04-01

Detection of anomalies of interest in x-ray images is an ever-evolving problem that requires the rapid development of automatic detection algorithms. Automatic detection algorithms are developed using machine learning techniques, which would require developers to obtain the x-ray machine that was used to create the images being trained on, and compile all associated metadata for those images by hand. The Passenger Baggage Object Database (PBOD) and data acquisition application were designed and developed for acquiring and persisting 2-D and 3-D x-ray image data and associated metadata. PBOD was specifically created to capture simulated airline passenger "stream of commerce" luggage data, but could be applied to other areas of x-ray imaging to utilize machine-learning methods.
ASDC Collaborations and Processes to Ensure Quality Metadata and Consistent Data Availability

NASA Astrophysics Data System (ADS)

Trapasso, T. J.

2017-12-01

With the introduction of new tools, faster computing, and less expensive storage, increased volumes of data are expected to be managed with existing or fewer resources. Metadata management is becoming a heightened challenge from the increase in data volume, resulting in more metadata records needed to be curated for each product. To address metadata availability and completeness, NASA ESDIS has taken significant strides with the creation of the United Metadata Model (UMM) and Common Metadata Repository (CMR). These UMM helps address hurdles experienced by the increasing number of metadata dialects and the CMR provides a primary repository for metadata so that required metadata fields can be served through a growing number of tools and services. However, metadata quality remains an issue as metadata is not always inherent to the end-user. In response to these challenges, the NASA Atmospheric Science Data Center (ASDC) created the Collaboratory for quAlity Metadata Preservation (CAMP) and defined the Product Lifecycle Process (PLP) to work congruently. CAMP is unique in that it provides science team members a UI to directly supply metadata that is complete, compliant, and accurate for their data products. This replaces back-and-forth communication that often results in misinterpreted metadata. Upon review by ASDC staff, metadata is submitted to CMR for broader distribution through Earthdata. Further, approval of science team metadata in CAMP automatically triggers the ASDC PLP workflow to ensure appropriate services are applied throughout the product lifecycle. This presentation will review the design elements of CAMP and PLP as well as demonstrate interfaces to each. It will show the benefits that CAMP and PLP provide to the ASDC that could potentially benefit additional NASA Earth Science Data and Information System (ESDIS) Distributed Active Archive Centers (DAACs).
Integrating Semantic Information in Metadata Descriptions for a Geoscience-wide Resource Inventory.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Gupta, A.; Valentine, D.; Whitenack, T.; Ozyurt, I. B.; Grethe, J. S.; Schachne, A.

2016-12-01

Integrating semantic information into legacy metadata catalogs is a challenging issue and so far has been mostly done on a limited scale. We present experience of CINERGI (Community Inventory of Earthcube Resources for Geoscience Interoperability), an NSF Earthcube Building Block project, in creating a large cross-disciplinary catalog of geoscience information resources to enable cross-domain discovery. The project developed a pipeline for automatically augmenting resource metadata, in particular generating keywords that describe metadata documents harvested from multiple geoscience information repositories or contributed by geoscientists through various channels including surveys and domain resource inventories. The pipeline examines available metadata descriptions using text parsing, vocabulary management and semantic annotation and graph navigation services of GeoSciGraph. GeoSciGraph, in turn, relies on a large cross-domain ontology of geoscience terms, which bridges several independently developed ontologies or taxonomies including SWEET, ENVO, YAGO, GeoSciML, GCMD, SWO, and CHEBI. The ontology content enables automatic extraction of keywords reflecting science domains, equipment used, geospatial features, measured properties, methods, processes, etc. We specifically focus on issues of cross-domain geoscience ontology creation, resolving several types of semantic conflicts among component ontologies or vocabularies, and constructing and managing facets for improved data discovery and navigation. The ontology and keyword generation rules are iteratively improved as pipeline results are presented to data managers for selective manual curation via a CINERGI Annotator user interface. We present lessons learned from applying CINERGI metadata augmentation pipeline to a number of federal agency and academic data registries, in the context of several use cases that require data discovery and integration across multiple earth science data catalogs of varying quality and completeness. The inventory is accessible at http://cinergi.sdsc.edu, and the CINERGI project web page is http://earthcube.org/group/cinergi
samiDB: A Prototype Data Archive for Big Science Exploration

NASA Astrophysics Data System (ADS)

Konstantopoulos, I. S.; Green, A. W.; Cortese, L.; Foster, C.; Scott, N.

2015-04-01

samiDB is an archive, database, and query engine to serve the spectra, spectral hypercubes, and high-level science products that make up the SAMI Galaxy Survey. Based on the versatile Hierarchical Data Format (HDF5), samiDB does not depend on relational database structures and hence lightens the setup and maintenance load imposed on science teams by metadata tables. The code, written in Python, covers the ingestion, querying, and exporting of data as well as the automatic setup of an HTML schema browser. samiDB serves as a maintenance-light data archive for Big Science and can be adopted and adapted by science teams that lack the means to hire professional archivists to set up the data back end for their projects.
Enriching the trustworthiness of health-related web pages.

PubMed

Gaudinat, Arnaud; Cruchet, Sarah; Boyer, Celia; Chrawdhry, Pravir

2011-06-01

We present an experimental mechanism for enriching web content with quality metadata. This mechanism is based on a simple and well-known initiative in the field of the health-related web, the HONcode. The Resource Description Framework (RDF) format and the Dublin Core Metadata Element Set were used to formalize these metadata. The model of trust proposed is based on a quality model for health-related web pages that has been tested in practice over a period of thirteen years. Our model has been explored in the context of a project to develop a research tool that automatically detects the occurrence of quality criteria in health-related web pages.
Enhancing the MeSH thesaurus to retrieve French online health resources in a quality-controlled gateway.

PubMed

Douyère, Magaly; Soualmia, Lina F; Névéol, Aurélie; Rogozan, Alexandrina; Dahamna, Badisse; Leroy, Jean-Philippe; Thirion, Benoît; Darmoni, Stefan J

2004-12-01

The amount of health information available on the Internet is considerable. In this context, several health gateways have been developed. Among them, CISMeF (Catalogue and Index of Health Resources in French) was designed to catalogue and index health resources in French. The goal of this article is to describe the various enhancements to the MeSH thesaurus developed by the CISMeF team to adapt this terminology to the broader field of health Internet resources instead of scientific articles for the medline bibliographic database. CISMeF uses two standard tools for organizing information: the MeSH thesaurus and several metadata element sets, in particular the Dublin Core metadata format. The heterogeneity of Internet health resources led the CISMeF team to enhance the MeSH thesaurus with the introduction of two new concepts, respectively, resource types and metaterms. CISMeF resource types are a generalization of the publication types of medline. A resource type describes the nature of the resource and MeSH keyword/qualifier pairs describe the subject of the resource. A metaterm is generally a medical specialty or a biological science, which has semantic links with one or more MeSH keywords, qualifiers and resource types. The CISMeF terminology is exploited for several tasks: resource indexing performed manually, resource categorization performed automatically, visualization and navigation through the concept hierarchies and information retrieval using the Doc'CISMeF search engine. The CISMeF health gateway uses several MeSH thesaurus enhancements to optimize information retrieval, hierarchy navigation and automatic indexing.
Metadata Wizard: an easy-to-use tool for creating FGDC-CSDGM metadata for geospatial datasets in ESRI ArcGIS Desktop

USGS Publications Warehouse

Ignizio, Drew A.; O'Donnell, Michael S.; Talbert, Colin B.

2014-01-01

Creating compliant metadata for scientific data products is mandated for all federal Geographic Information Systems professionals and is a best practice for members of the geospatial data community. However, the complexity of the The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata, the limited availability of easy-to-use tools, and recent changes in the ESRI software environment continue to make metadata creation a challenge. Staff at the U.S. Geological Survey Fort Collins Science Center have developed a Python toolbox for ESRI ArcDesktop to facilitate a semi-automated workflow to create and update metadata records in ESRI’s 10.x software. The U.S. Geological Survey Metadata Wizard tool automatically populates several metadata elements: the spatial reference, spatial extent, geospatial presentation format, vector feature count or raster column/row count, native system/processing environment, and the metadata creation date. Once the software auto-populates these elements, users can easily add attribute definitions and other relevant information in a simple Graphical User Interface. The tool, which offers a simple design free of esoteric metadata language, has the potential to save many government and non-government organizations a significant amount of time and costs by facilitating the development of The Federal Geographic Data Committee’s Content Standards for Digital Geospatial Metadata compliant metadata for ESRI software users. A working version of the tool is now available for ESRI ArcDesktop, version 10.0, 10.1, and 10.2 (downloadable at http:/www.sciencebase.gov/metadatawizard).
Proceeding of the ACM/IEEE-CS Joint Conference on Digital Libraries (1st, Roanoke, Virginia, June 24-28, 2001).

ERIC Educational Resources Information Center

Association for Computing Machinery, New York, NY.

Papers in this Proceedings of the ACM/IEEE-CS Joint Conference on Digital Libraries (Roanoke, Virginia, June 24-28, 2001) discuss: automatic genre analysis; text categorization; automated name authority control; automatic event generation; linked active content; designing e-books for legal research; metadata harvesting; mapping the…
Representing Hydrologic Models as HydroShare Resources to Facilitate Model Sharing and Collaboration

NASA Astrophysics Data System (ADS)

Castronova, A. M.; Goodall, J. L.; Mbewe, P.

2013-12-01

The CUAHSI HydroShare project is a collaborative effort that aims to provide software for sharing data and models within the hydrologic science community. One of the early focuses of this work has been establishing metadata standards for describing models and model-related data as HydroShare resources. By leveraging this metadata definition, a prototype extension has been developed to create model resources that can be shared within the community using the HydroShare system. The extension uses a general model metadata definition to create resource objects, and was designed so that model-specific parsing routines can extract and populate metadata fields from model input and output files. The long term goal is to establish a library of supported models where, for each model, the system has the ability to extract key metadata fields automatically, thereby establishing standardized model metadata that will serve as the foundation for model sharing and collaboration within HydroShare. The Soil Water & Assessment Tool (SWAT) is used to demonstrate this concept through a case study application.
ModelArchiver—A program for facilitating the creation of groundwater model archives

USGS Publications Warehouse

Winston, Richard B.

2018-03-01

ModelArchiver is a program designed to facilitate the creation of groundwater model archives that meet the requirements of the U.S. Geological Survey (USGS) policy (Office of Groundwater Technical Memorandum 2016.02, https://water.usgs.gov/admin/memo/GW/gw2016.02.pdf, https://water.usgs.gov/ogw/policy/gw-model/). ModelArchiver version 1.0 leads the user step-by-step through the process of creating a USGS groundwater model archive. The user specifies the contents of each of the subdirectories within the archive and provides descriptions of the archive contents. Descriptions of some files can be specified automatically using file extensions. Descriptions also can be specified individually. Those descriptions are added to a readme.txt file provided by the user. ModelArchiver moves the content of the archive to the archive folder and compresses some folders into .zip files.As part of the archive, the modeler must create a metadata file describing the archive. The program has a built-in metadata editor and provides links to websites that can aid in creation of the metadata. The built-in metadata editor is also available as a stand-alone program named FgdcMetaEditor version 1.0, which also is described in this report. ModelArchiver updates the metadata file provided by the user with descriptions of the files in the archive. An optional archive list file generated automatically by ModelMuse can streamline the creation of archives by identifying input files, output files, model programs, and ancillary files for inclusion in the archive.
EARS : Repositioning data management near data acquisition.

NASA Astrophysics Data System (ADS)

Sinquin, Jean-Marc; Sorribas, Jordi; Diviacco, Paolo; Vandenberghe, Thomas; Munoz, Raquel; Garcia, Oscar

2016-04-01

The EU FP7 Projects Eurofleets and Eurofleets2 are an European wide alliance of marine research centers that aim to share their research vessels, to improve information sharing on planned, current and completed cruises, on details of ocean-going research vessels and specialized equipment, and to durably improve cost-effectiveness of cruises. Within this context logging of information on how, when and where anything happens on board of the vessel is crucial information for data users in a later stage. This forms a primordial step in the process of data quality control as it could assist in the understanding of anomalies and unexpected trends recorded in the acquired data sets. In this way completeness of the metadata is improved as it is recorded accurately at the origin of the measurement. The collection of this crucial information has been done in very different ways, using different procedures, formats and pieces of software in the context of the European Research Fleet. At the time that the Eurofleets project started, every institution and country had adopted different strategies and approaches, which complicated the task of users that need to log general purpose information and events on-board whenever they access a different platform loosing the opportunity to produce this valuable metadata on-board. Among the many goals the Eurofleets project has, a very important task is the development of an "event log software" called EARS (Eurofleets Automatic Reporting System) that enables scientists and operators to record what happens during a survey. EARS will allow users to fill, in a standardized way, the gap existing at the moment in metadata description that only very seldom links data with its history. Events generated automatically by acquisition instruments will also be handled, enhancing the granularity and precision of the event annotation. The adoption of a common procedure to log survey events and a common terminology to describe them is crucial to provide a friendly and successfully metadata on-board creation procedure for the whole the European Fleet. The possibility of automatically reporting metadata and general purpose data, will simplify the work of scientists and data managers with regards to data transmission. An improved accuracy and completeness of metadata is expected when events are recorded at acquisition time. This will also enhance multiple usages of the data as it allows verification of the different requirements existing in different disciplines.
Using RDF and Git to Realize a Collaborative Metadata Repository.

PubMed

Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas

2018-01-01

The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.
Hybrid Multiagent System for Automatic Object Learning Classification

NASA Astrophysics Data System (ADS)

Gil, Ana; de La Prieta, Fernando; López, Vivian F.

The rapid evolution within the context of e-learning is closely linked to international efforts on the standardization of learning object metadata, which provides learners in a web-based educational system with ubiquitous access to multiple distributed repositories. This article presents a hybrid agent-based architecture that enables the recovery of learning objects tagged in Learning Object Metadata (LOM) and provides individualized help with selecting learning materials to make the most suitable choice among many alternatives.
An Observation Knowledgebase for Hinode Data

NASA Astrophysics Data System (ADS)

Hurlburt, Neal E.; Freeland, S.; Green, S.; Schiff, D.; Seguin, R.; Slater, G.; Cirtain, J.

2007-05-01

We have developed a standards-based system for the Solar Optical and X Ray Telescopes on the Hinode orbiting solar observatory which can serve as part of a developing Heliophysics informatics system. Our goal is to make the scientific data acquired by Hinode more accessible and useful to scientists by allowing them to do reasoning and flexible searches on observation metadata and to ask higher-level questions of the system than previously allowed. The Hinode Observation Knowledgebase relates the intentions and goals of the observation planners (as-planned metadata) with actual observational data (as-run metadata), along with connections to related models, data products and identified features (follow-up metadata) through a citation system. Summaries of the data (both as image thumbnails and short "film strips") serve to guide researchers to the observations appropriate for their research, and these are linked directly to the data catalog for easy extraction and delivery. The semantic information of the observation (Field of view, wavelength, type of observable, average cadence etc.) is captured through simple user interfaces and encoded using the VOEvent XML standard (with the addition of some solar-related extensions). These interfaces merge metadata acquired automatically during both mission planning and an data analysis (see Seguin et. al. 2007 at this meeting) phases with that obtained directly from the planner/analyst and send them to be incorporated into the knowledgebase. The resulting information is automatically rendered into standard categories based on planned and recent observations, as well as by popularity and recommendations by the science team. They are also directly searchable through both and web-based searches and direct calls to the API. Observations details can also be rendered as RSS, iTunes and Google Earth interfaces. The resulting system provides a useful tool to researchers and can act as a demonstration for larger, more complex systems.

Managing biomedical image metadata for search and retrieval of similar images.

PubMed

Korenblum, Daniel; Rubin, Daniel; Napel, Sandy; Rodriguez, Cesar; Beaulieu, Chris

2011-08-01

Radiology images are generally disconnected from the metadata describing their contents, such as imaging observations ("semantic" metadata), which are usually described in text reports that are not directly linked to the images. We developed a system, the Biomedical Image Metadata Manager (BIMM) to (1) address the problem of managing biomedical image metadata and (2) facilitate the retrieval of similar images using semantic feature metadata. Our approach allows radiologists, researchers, and students to take advantage of the vast and growing repositories of medical image data by explicitly linking images to their associated metadata in a relational database that is globally accessible through a Web application. BIMM receives input in the form of standard-based metadata files using Web service and parses and stores the metadata in a relational database allowing efficient data query and maintenance capabilities. Upon querying BIMM for images, 2D regions of interest (ROIs) stored as metadata are automatically rendered onto preview images included in search results. The system's "match observations" function retrieves images with similar ROIs based on specific semantic features describing imaging observation characteristics (IOCs). We demonstrate that the system, using IOCs alone, can accurately retrieve images with diagnoses matching the query images, and we evaluate its performance on a set of annotated liver lesion images. BIMM has several potential applications, e.g., computer-aided detection and diagnosis, content-based image retrieval, automating medical analysis protocols, and gathering population statistics like disease prevalences. The system provides a framework for decision support systems, potentially improving their diagnostic accuracy and selection of appropriate therapies.
Hierarchical video summarization based on context clustering

NASA Astrophysics Data System (ADS)

Tseng, Belle L.; Smith, John R.

2003-11-01

A personalized video summary is dynamically generated in our video personalization and summarization system based on user preference and usage environment. The three-tier personalization system adopts the server-middleware-client architecture in order to maintain, select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. In this paper, the metadata includes visual semantic annotations and automatic speech transcriptions. Our personalization and summarization engine in the middleware selects the optimal set of desired video segments by matching shot annotations and sentence transcripts with user preferences. Besides finding the desired contents, the objective is to present a coherent summary. There are diverse methods for creating summaries, and we focus on the challenges of generating a hierarchical video summary based on context information. In our summarization algorithm, three inputs are used to generate the hierarchical video summary output. These inputs are (1) MPEG-7 metadata descriptions of the contents in the server, (2) user preference and usage environment declarations from the user client, and (3) context information including MPEG-7 controlled term list and classification scheme. In a video sequence, descriptions and relevance scores are assigned to each shot. Based on these shot descriptions, context clustering is performed to collect consecutively similar shots to correspond to hierarchical scene representations. The context clustering is based on the available context information, and may be derived from domain knowledge or rules engines. Finally, the selection of structured video segments to generate the hierarchical summary efficiently balances between scene representation and shot selection.
Automatic Inference of Cryptographic Key Length Based on Analysis of Proof Tightness

DTIC Science & Technology

2016-06-01

within an attack tree structure, then expand attack tree methodology to include cryptographic reductions. We then provide the algorithms for...maintaining and automatically reasoning about these expanded attack trees . We provide a software tool that utilizes machine-readable proof and attack metadata...and the attack tree methodology to provide rapid and precise answers regarding security parameters and effective security. This eliminates the need
OntoStudyEdit: a new approach for ontology-based representation and management of metadata in clinical and epidemiological research.

PubMed

Uciteli, Alexandr; Herre, Heinrich

2015-01-01

The specification of metadata in clinical and epidemiological study projects absorbs significant expense. The validity and quality of the collected data depend heavily on the precise and semantical correct representation of their metadata. In various research organizations, which are planning and coordinating studies, the required metadata are specified differently, depending on many conditions, e.g., on the used study management software. The latter does not always meet the needs of a particular research organization, e.g., with respect to the relevant metadata attributes and structuring possibilities. The objective of the research, set forth in this paper, is the development of a new approach for ontology-based representation and management of metadata. The basic features of this approach are demonstrated by the software tool OntoStudyEdit (OSE). The OSE is designed and developed according to the three ontology method. This method for developing software is based on the interactions of three different kinds of ontologies: a task ontology, a domain ontology and a top-level ontology. The OSE can be easily adapted to different requirements, and it supports an ontologically founded representation and efficient management of metadata. The metadata specifications can by imported from various sources; they can be edited with the OSE, and they can be exported in/to several formats, which are used, e.g., by different study management software. Advantages of this approach are the adaptability of the OSE by integrating suitable domain ontologies, the ontological specification of mappings between the import/export formats and the DO, the specification of the study metadata in a uniform manner and its reuse in different research projects, and an intuitive data entry for non-expert users.
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies.

PubMed

Kumar, Pankaj; Halama, Anna; Hayat, Shahina; Billing, Anja M; Gupta, Manish; Yousri, Noha A; Smith, Gregory M; Suhre, Karsten

2015-01-01

The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
Metadata to Support Data Warehouse Evolution

NASA Astrophysics Data System (ADS)

Solodovnikova, Darja

The focus of this chapter is metadata necessary to support data warehouse evolution. We present the data warehouse framework that is able to track evolution process and adapt data warehouse schemata and data extraction, transformation, and loading (ETL) processes. We discuss the significant part of the framework, the metadata repository that stores information about the data warehouse, logical and physical schemata and their versions. We propose the physical implementation of multiversion data warehouse in a relational DBMS. For each modification of a data warehouse schema, we outline the changes that need to be made to the repository metadata and in the database.
OAI and NASA's Scientific and Technical Information.

ERIC Educational Resources Information Center

Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.

2003-01-01

Details NASA's (National Aeronautics & Space Administration (USA)) involvement in defining and testing the Open Archives Initiative (OAI) Protocol for Metadata Harvesting (OAI-PMH) and experience with adapting existing NASA distributed searching DLs (digital libraries) to use the OAI-PMH and metadata harvesting. Discusses some new digital…
ClinData Express – A Metadata Driven Clinical Research Data Management System for Secondary Use of Clinical Data

PubMed Central

Li, Zuofeng; Wen, Jingran; Zhang, Xiaoyan; Wu, Chunxiao; Li, Zuogao; Liu, Lei

2012-01-01

Aim to ease the secondary use of clinical data in clinical research, we introduce a metadata driven web-based clinical data management system named ClinData Express. ClinData Express is made up of two parts: 1) m-designer, a standalone software for metadata definition; 2) a web based data warehouse system for data management. With ClinData Express, what the researchers need to do is to define the metadata and data model in the m-designer. The web interface for data collection and specific database for data storage will be automatically generated. The standards used in the system and the data export modular make sure of the data reuse. The system has been tested on seven disease-data collection in Chinese and one form from dbGap. The flexibility of system makes its great potential usage in clinical research. The system is available at http://code.google.com/p/clindataexpress. PMID:23304327
Metadata mapping and reuse in caBIG.

PubMed

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-02-05

This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories. The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository. This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
Automated DICOM metadata and volumetric anatomical information extraction for radiation dosimetry

NASA Astrophysics Data System (ADS)

Papamichail, D.; Ploussi, A.; Kordolaimi, S.; Karavasilis, E.; Papadimitroulas, P.; Syrgiamiotis, V.; Efstathopoulos, E.

2015-09-01

Patient-specific dosimetry calculations based on simulation techniques have as a prerequisite the modeling of the modality system and the creation of voxelized phantoms. This procedure requires the knowledge of scanning parameters and patients’ information included in a DICOM file as well as image segmentation. However, the extraction of this information is complicated and time-consuming. The objective of this study was to develop a simple graphical user interface (GUI) to (i) automatically extract metadata from every slice image of a DICOM file in a single query and (ii) interactively specify the regions of interest (ROI) without explicit access to the radiology information system. The user-friendly application developed in Matlab environment. The user can select a series of DICOM files and manage their text and graphical data. The metadata are automatically formatted and presented to the user as a Microsoft Excel file. The volumetric maps are formed by interactively specifying the ROIs and by assigning a specific value in every ROI. The result is stored in DICOM format, for data and trend analysis. The developed GUI is easy, fast and and constitutes a very useful tool for individualized dosimetry. One of the future goals is to incorporate a remote access to a PACS server functionality.
EarthCube Data Discovery Hub: Enhancing, Curating and Finding Data across Multiple Geoscience Data Sources.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Valentine, D.; Richard, S. M.; Gupta, A.; Meier, O.; Peucker-Ehrenbrink, B.; Hudman, G.; Stocks, K. I.; Hsu, L.; Whitenack, T.; Grethe, J. S.; Ozyurt, I. B.

2017-12-01

EarthCube Data Discovery Hub (DDH) is an EarthCube Building Block project using technologies developed in CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability) to enable geoscience users to explore a growing portfolio of EarthCube-created and other geoscience-related resources. Over 1 million metadata records are available for discovery through the project portal (cinergi.sdsc.edu). These records are retrieved from data facilities, including federal, state and academic sources, or contributed by geoscientists through workshops, surveys, or other channels. CINERGI metadata augmentation pipeline components 1) provide semantic enhancement based on a large ontology of geoscience terms, using text analytics to generate keywords with references to ontology classes, 2) add spatial extents based on place names found in the metadata record, and 3) add organization identifiers to the metadata. The records are indexed and can be searched via a web portal and standard search APIs. The added metadata content improves discoverability and interoperability of the registered resources. Specifically, the addition of ontology-anchored keywords enables faceted browsing and lets users navigate to datasets related by variables measured, equipment used, science domain, processes described, geospatial features studied, and other dataset characteristics that are generated by the pipeline. DDH also lets data curators access and edit the automatically generated metadata records using the CINERGI metadata editor, accept or reject the enhanced metadata content, and consider it in updating their metadata descriptions. We consider several complex data discovery workflows, in environmental seismology (quantifying sediment and water fluxes using seismic data), marine biology (determining available temperature, location, weather and bleaching characteristics of coral reefs related to measurements in a given coral reef survey), and river geochemistry (discovering observations relevant to geochemical measurements outside the tidal zone, given specific discharge conditions).
Associating uncertainty with datasets using Linked Data and allowing propagation via provenance chains

NASA Astrophysics Data System (ADS)

Car, Nicholas; Cox, Simon; Fitch, Peter

2015-04-01

With earth-science datasets increasingly being published to enable re-use in projects disassociated from the original data acquisition or generation, there is an urgent need for associated metadata to be connected, in order to guide their application. In particular, provenance traces should support the evaluation of data quality and reliability. However, while standards for describing provenance are emerging (e.g. PROV-O), these do not include the necessary statistical descriptors and confidence assessments. UncertML has a mature conceptual model that may be used to record uncertainty metadata. However, by itself UncertML does not support the representation of uncertainty of multi-part datasets, and provides no direct way of associating the uncertainty information - metadata in relation to a dataset - with dataset objects.We present a method to address both these issues by combining UncertML with PROV-O, and delivering resulting uncertainty-enriched provenance traces through the Linked Data API. UncertProv extends the PROV-O provenance ontology with an RDF formulation of the UncertML conceptual model elements, adds further elements to support uncertainty representation without a conceptual model and the integration of UncertML through links to documents. The Linked ID API provides a systematic way of navigating from dataset objects to their UncertProv metadata and back again. The Linked Data API's 'views' capability enables access to UncertML and non-UncertML uncertainty metadata representations for a dataset. With this approach, it is possible to access and navigate the uncertainty metadata associated with a published dataset using standard semantic web tools, such as SPARQL queries. Where the uncertainty data follows the UncertML model it can be automatically interpreted and may also support automatic uncertainty propagation . Repositories wishing to enable uncertainty propagation for all datasets must ensure that all elements that are associated with uncertainty (PROV-O Entity and Activity classes) have UncertML elements recorded. This methodology is intentionally flexible to allow uncertainty metadata in many forms, not limited to UncertML. While the more formal representation of uncertainty metadata is desirable (using UncertProv elements to implement the UncertML conceptual model ), this will not always be possible, and any uncertainty data stored will be better than none. Since the UncertProv ontology contains a superset of UncertML elements to facilitate the representation of non-UncertML uncertainty data, it could easily be extended to include other formal uncertainty conceptual models thus allowing non-UncertML propagation calculations.
Automatic Generation of Data Types for Classification of Deep Web Sources

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ngu, A H; Buttler, D J; Critchlow, T J

2005-02-14

A Service Class Description (SCD) is an effective meta-data based approach for discovering Deep Web sources whose data exhibit some regular patterns. However, it is tedious and error prone to create an SCD description manually. Moreover, a manually created SCD is not adaptive to the frequent changes of Web sources. It requires its creator to identify all the possible input and output types of a service a priori. In many domains, it is impossible to exhaustively list all the possible input and output data types of a source in advance. In this paper, we describe machine learning approaches for automaticmore » generation of the data types of an SCD. We propose two different approaches for learning data types of a class of Web sources. The Brute-Force Learner is able to generate data types that can achieve high recall, but with low precision. The Clustering-based Learner generates data types that have a high precision rate, but with a lower recall rate. We demonstrate the feasibility of these two learning-based solutions for automatic generation of data types for citation Web sources and presented a quantitative evaluation of these two solutions.« less
The XML Metadata Editor of GFZ Data Services

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Elger, Kirsten; Tesei, Telemaco; Trippanera, Daniele

2017-04-01

Following the FAIR data principles, research data should be Findable, Accessible, Interoperable and Reuseable. Publishing data under these principles requires to assign persistent identifiers to the data and to generate rich machine-actionable metadata. To increase the interoperability, metadata should include shared vocabularies and crosslink the newly published (meta)data and related material. However, structured metadata formats tend to be complex and are not intended to be generated by individual scientists. Software solutions are needed that support scientists in providing metadata describing their data. To facilitate data publication activities of 'GFZ Data Services', we programmed an XML metadata editor that assists scientists to create metadata in different schemata popular in the earth sciences (ISO19115, DIF, DataCite), while being at the same time usable by and understandable for scientists. Emphasis is placed on removing barriers, in particular the editor is publicly available on the internet without registration [1] and the scientists are not requested to provide information that may be generated automatically (e.g. the URL of a specific licence or the contact information of the metadata distributor). Metadata are stored in browser cookies and a copy can be saved to the local hard disk. To improve usability, form fields are translated into the scientific language, e.g. 'creators' of the DataCite schema are called 'authors'. To assist filling in the form, we make use of drop down menus for small vocabulary lists and offer a search facility for large thesauri. Explanations to form fields and definitions of vocabulary terms are provided in pop-up windows and a full documentation is available for download via the help menu. In addition, multiple geospatial references can be entered via an interactive mapping tool, which helps to minimize problems with different conventions to provide latitudes and longitudes. Currently, we are extending the metadata editor to be reused to generate metadata for data discovery and contextual metadata developed by the 'Multi-scale Laboratories' Thematic Core Service of the European Plate Observing System (EPOS-IP). The Editor will be used to build a common repository of a large variety of geological and geophysical datasets produced by multidisciplinary laboratories throughout Europe, thus contributing to a significant step toward the integration and accessibility of earth science data. This presentation will introduce the metadata editor and show the adjustments made for EPOS-IP. [1] http://dataservices.gfz-potsdam.de/panmetaworks/metaedit
Automated metadata--final project report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David

This report summarizes the work of the Automated Metadata, Provenance Cataloging, and Navigable Interfaces: Ensuring the Usefulness of Extreme-Scale Data Project (MPO Project) funded by the United States Department of Energy (DOE), Offices of Advanced Scientific Computing Research and Fusion Energy Sciences. Initially funded for three years starting in 2012, it was extended for 6 months with additional funding. The project was a collaboration between scientists at General Atomics, Lawrence Berkley National Laboratory (LBNL), and Massachusetts Institute of Technology (MIT). The group leveraged existing computer science technology where possible, and extended or created new capabilities where required. The MPO projectmore » was able to successfully create a suite of software tools that can be used by a scientific community to automatically document their scientific workflows. These tools were integrated into workflows for fusion energy and climate research illustrating the general applicability of the project’s toolkit. Feedback was very positive on the project’s toolkit and the value of such automatic workflow documentation to the scientific endeavor.« less
Metadata mapping and reuse in caBIG™

PubMed Central

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-01-01

Background This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories. Results The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository. Conclusion This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies. PMID:19208192
DAS: A Data Management System for Instrument Tests and Operations

NASA Astrophysics Data System (ADS)

Frailis, M.; Sartor, S.; Zacchei, A.; Lodi, M.; Cirami, R.; Pasian, F.; Trifoglio, M.; Bulgarelli, A.; Gianotti, F.; Franceschi, E.; Nicastro, L.; Conforti, V.; Zoli, A.; Smart, R.; Morbidelli, R.; Dadina, M.

2014-05-01

The Data Access System (DAS) is a and data management software system, providing a reusable solution for the storage of data acquired both from telescopes and auxiliary data sources during the instrument development phases and operations. It is part of the Customizable Instrument WorkStation system (CIWS-FW), a framework for the storage, processing and quick-look at the data acquired from scientific instruments. The DAS provides a data access layer mainly targeted to software applications: quick-look displays, pre-processing pipelines and scientific workflows. It is logically organized in three main components: an intuitive and compact Data Definition Language (DAS DDL) in XML format, aimed for user-defined data types; an Application Programming Interface (DAS API), automatically adding classes and methods supporting the DDL data types, and providing an object-oriented query language; a data management component, which maps the metadata of the DDL data types in a relational Data Base Management System (DBMS), and stores the data in a shared (network) file system. With the DAS DDL, developers define the data model for a particular project, specifying for each data type the metadata attributes, the data format and layout (if applicable), and named references to related or aggregated data types. Together with the DDL user-defined data types, the DAS API acts as the only interface to store, query and retrieve the metadata and data in the DAS system, providing both an abstract interface and a data model specific one in C, C++ and Python. The mapping of metadata in the back-end database is automatic and supports several relational DBMSs, including MySQL, Oracle and PostgreSQL.
Automatic Conversion of Metadata from the Study of Health in Pomerania to ODM.

PubMed

Hegselmann, Stefan; Gessner, Sophia; Neuhaus, Philipp; Henke, Jörg; Schmidt, Carsten Oliver; Dugas, Martin

2017-01-01

Electronic collection and high quality analysis of medical data is expected to have a big potential to improve patient care and medical research. However, the integration of data from different stake holders is posing a crucial problem. The exchange and reuse of medical data models as well as annotations with unique semantic identifiers were proposed as a solution. Convert metadata from the Study of Health in Pomerania to the standardized CDISC ODM format. The structure of the two data formats is analyzed and a mapping is suggested and implemented. The metadata from the Study of Health in Pomerania was successfully converted to ODM. All relevant information was included in the resulting forms. Three sample forms were evaluated in-depth, which demonstrates the feasibility of this conversion. Hundreds of data entry forms with more than 15.000 items can be converted into a standardized format with some limitations, e.g. regarding logical constraints. This enables the integration of the Study of Health in Pomerania metadata into various systems, facilitating the implementation and reuse in different study sites.
Metadata Mapper: a web service for mapping data between independent visual analysis components, guided by perceptual rules

NASA Astrophysics Data System (ADS)

Rogowitz, Bernice E.; Matasci, Naim

2011-03-01

The explosion of online scientific data from experiments, simulations, and observations has given rise to an avalanche of algorithmic, visualization and imaging methods. There has also been enormous growth in the introduction of tools that provide interactive interfaces for exploring these data dynamically. Most systems, however, do not support the realtime exploration of patterns and relationships across tools and do not provide guidance on which colors, colormaps or visual metaphors will be most effective. In this paper, we introduce a general architecture for sharing metadata between applications and a "Metadata Mapper" component that allows the analyst to decide how metadata from one component should be represented in another, guided by perceptual rules. This system is designed to support "brushing [1]," in which highlighting a region of interest in one application automatically highlights corresponding values in another, allowing the scientist to develop insights from multiple sources. Our work builds on the component-based iPlant Cyberinfrastructure [2] and provides a general approach to supporting interactive, exploration across independent visualization and visual analysis components.
Using phrases and document metadata to improve topic modeling of clinical reports.

PubMed

Speier, William; Ong, Michael K; Arnold, Corey W

2016-06-01

Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.

In Interactive, Web-Based Approach to Metadata Authoring

NASA Technical Reports Server (NTRS)

Pollack, Janine; Wharton, Stephen W. (Technical Monitor)

2001-01-01

NASA's Global Change Master Directory (GCMD) serves a growing number of users by assisting the scientific community in the discovery of and linkage to Earth science data sets and related services. The GCMD holds over 8000 data set descriptions in Directory Interchange Format (DIF) and 200 data service descriptions in Service Entry Resource Format (SERF), encompassing the disciplines of geology, hydrology, oceanography, meteorology, and ecology. Data descriptions also contain geographic coverage information, thus allowing researchers to discover data pertaining to a particular geographic location, as well as subject of interest. The GCMD strives to be the preeminent data locator for world-wide directory level metadata. In this vein, scientists and data providers must have access to intuitive and efficient metadata authoring tools. Existing GCMD tools are not currently attracting. widespread usage. With usage being the prime indicator of utility, it has become apparent that current tools must be improved. As a result, the GCMD has released a new suite of web-based authoring tools that enable a user to create new data and service entries, as well as modify existing data entries. With these tools, a more interactive approach to metadata authoring is taken, as they feature a visual "checklist" of data/service fields that automatically update when a field is completed. In this way, the user can quickly gauge which of the required and optional fields have not been populated. With the release of these tools, the Earth science community will be further assisted in efficiently creating quality data and services metadata. Keywords: metadata, Earth science, metadata authoring tools
Auspice: Automatic Service Planning in Cloud/Grid Environments

NASA Astrophysics Data System (ADS)

Chiu, David; Agrawal, Gagan

Recent scientific advances have fostered a mounting number of services and data sets available for utilization. These resources, though scattered across disparate locations, are often loosely coupled both semantically and operationally. This loosely coupled relationship implies the possibility of linking together operations and data sets to answer queries. This task, generally known as automatic service composition, therefore abstracts the process of complex scientific workflow planning from the user. We have been exploring a metadata-driven approach toward automatic service workflow composition, among other enabling mechanisms, in our system, Auspice: Automatic Service Planning in Cloud/Grid Environments. In this paper, we present a complete overview of our system's unique features and outlooks for future deployment as the Cloud computing paradigm becomes increasingly eminent in enabling scientific computing.
System and method for integrating and accessing multiple data sources within a data warehouse architecture

DOEpatents

Musick, Charles R [Castro Valley, CA; Critchlow, Terence [Livermore, CA; Ganesh, Madhaven [San Jose, CA; Slezak, Tom [Livermore, CA; Fidelis, Krzysztof [Brentwood, CA

2006-12-19

A system and method is disclosed for integrating and accessing multiple data sources within a data warehouse architecture. The metadata formed by the present method provide a way to declaratively present domain specific knowledge, obtained by analyzing data sources, in a consistent and useable way. Four types of information are represented by the metadata: abstract concepts, databases, transformations and mappings. A mediator generator automatically generates data management computer code based on the metadata. The resulting code defines a translation library and a mediator class. The translation library provides a data representation for domain specific knowledge represented in a data warehouse, including "get" and "set" methods for attributes that call transformation methods and derive a value of an attribute if it is missing. The mediator class defines methods that take "distinguished" high-level objects as input and traverse their data structures and enter information into the data warehouse.
Pragmatic Metadata Management for Integration into Multiple Spatial Data Infrastructure Systems and Platforms

NASA Astrophysics Data System (ADS)

Benedict, K. K.; Scott, S.

2013-12-01

While there has been a convergence towards a limited number of standards for representing knowledge (metadata) about geospatial (and other) data objects and collections, there exist a variety of community conventions around the specific use of those standards and within specific data discovery and access systems. This combination of limited (but multiple) standards and conventions creates a challenge for system developers that aspire to participate in multiple data infrastrucutres, each of which may use a different combination of standards and conventions. While Extensible Markup Language (XML) is a shared standard for encoding most metadata, traditional direct XML transformations (XSLT) from one standard to another often result in an imperfect transfer of information due to incomplete mapping from one standard's content model to another. This paper presents the work at the University of New Mexico's Earth Data Analysis Center (EDAC) in which a unified data and metadata management system has been developed in support of the storage, discovery and access of heterogeneous data products. This system, the Geographic Storage, Transformation and Retrieval Engine (GSTORE) platform has adopted a polyglot database model in which a combination of relational and document-based databases are used to store both data and metadata, with some metadata stored in a custom XML schema designed as a superset of the requirements for multiple target metadata standards: ISO 19115-2/19139/19110/19119, FGCD CSDGM (both with and without remote sensing extensions) and Dublin Core. Metadata stored within this schema is complemented by additional service, format and publisher information that is dynamically "injected" into produced metadata documents when they are requested from the system. While mapping from the underlying common metadata schema is relatively straightforward, the generation of valid metadata within each target standard is necessary but not sufficient for integration into multiple data infrastructures, as has been demonstrated through EDAC's testing and deployment of metadata into multiple external systems: Data.Gov, the GEOSS Registry, the DataONE network, the DSpace based institutional repository at UNM and semantic mediation systems developed as part of the NASA ACCESS ELSeWEB project. Each of these systems requires valid metadata as a first step, but to make most effective use of the delivered metadata each also has a set of conventions that are specific to the system. This presentation will provide an overview of the underlying metadata management model, the processes and web services that have been developed to automatically generate metadata in a variety of standard formats and highlight some of the specific modifications made to the output metadata content to support the different conventions used by the multiple metadata integration endpoints.
Constructing a Cross-Domain Resource Inventory: Key Components and Results of the EarthCube CINERGI Project.

NASA Astrophysics Data System (ADS)

Zaslavsky, I.; Richard, S. M.; Malik, T.; Hsu, L.; Gupta, A.; Grethe, J. S.; Valentine, D. W., Jr.; Lehnert, K. A.; Bermudez, L. E.; Ozyurt, I. B.; Whitenack, T.; Schachne, A.; Giliarini, A.

2015-12-01

While many geoscience-related repositories and data discovery portals exist, finding information about available resources remains a pervasive problem, especially when searching across multiple domains and catalogs. Inconsistent and incomplete metadata descriptions, disparate access protocols and semantic differences across domains, and troves of unstructured or poorly structured information which is hard to discover and use are major hindrances toward discovery, while metadata compilation and curation remain manual and time-consuming. We report on methodology, main results and lessons learned from an ongoing effort to develop a geoscience-wide catalog of information resources, with consistent metadata descriptions, traceable provenance, and automated metadata enhancement. Developing such a catalog is the central goal of CINERGI (Community Inventory of EarthCube Resources for Geoscience Interoperability), an EarthCube building block project (earthcube.org/group/cinergi). The key novel technical contributions of the projects include: a) development of a metadata enhancement pipeline and a set of document enhancers to automatically improve various aspects of metadata descriptions, including keyword assignment and definition of spatial extents; b) Community Resource Viewers: online applications for crowdsourcing community resource registry development, curation and search, and channeling metadata to the unified CINERGI inventory, c) metadata provenance, validation and annotation services, d) user interfaces for advanced resource discovery; and e) geoscience-wide ontology and machine learning to support automated semantic tagging and faceted search across domains. We demonstrate these CINERGI components in three types of user scenarios: (1) improving existing metadata descriptions maintained by government and academic data facilities, (2) supporting work of several EarthCube Research Coordination Network projects in assembling information resources for their domains, and (3) enhancing the inventory and the underlying ontology to address several complicated data discovery use cases in hydrology, geochemistry, sedimentology, and critical zone science. Support from the US National Science Foundation under award ICER-1343816 is gratefully acknowledged.
The ground truth about metadata and community detection in networks.

PubMed

Peel, Leto; Larremore, Daniel B; Clauset, Aaron

2017-05-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system's components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks' links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures.
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models.

PubMed

Misra, Dharitri; Chen, Siyuan; Thoma, George R

2009-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques.At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts.In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system.
{Semantic metadata application for information resources systematization in water spectroscopy} A.Fazliev (1), A.Privezentsev (1), J.Tennyson (2) (1) Institute of Atmospheric Optics SB RAS, Tomsk, Russia, (2) University College London, London, UK (faz@iao

NASA Astrophysics Data System (ADS)

Fazliev, A.

2009-04-01

The information and knowledge layers of information-computational system for water spectroscopy are described. Semantic metadata for all the tasks of domain information model that are the basis of the layers have been studied. The principle of semantic metadata determination and mechanisms of the usage during information systematization in molecular spectroscopy has been revealed. The software developed for the work with semantic metadata is described as well. Formation of domain model in the framework of Semantic Web is based on the use of explicit specification of its conceptualization or, in other words, its ontologies. Formation of conceptualization for molecular spectroscopy was described in Refs. 1, 2. In these works two chains of task are selected for zeroth approximation for knowledge domain description. These are direct tasks chain and inverse tasks chain. Solution schemes of these tasks defined approximation of data layer for knowledge domain conceptualization. Spectroscopy tasks solutions properties lead to a step-by-step extension of molecular spectroscopy conceptualization. Information layer of information system corresponds to this extension. An advantage of molecular spectroscopy model designed in a form of tasks chain is actualized in the fact that one can explicitly define data and metadata at each step of solution of these molecular spectroscopy chain tasks. Metadata structure (tasks solutions properties) in knowledge domain also has form of a chain in which input data and metadata of the previous task become metadata of the following tasks. The term metadata is used in its narrow sense: metadata are the properties of spectroscopy tasks solutions. Semantic metadata represented with the help of OWL 3 are formed automatically and they are individuals of classes (A-box). Unification of T-box and A-box is an ontology that can be processed with the help of inference engine. In this work we analyzed the formation of individuals of molecular spectroscopy applied ontologies as well as the software used for their creation by means of OWL DL language. The results of this work are presented in a form of an information layer and a knowledge layer in W@DIS information system 4. 1 FORMATION OF INDIVIDUALS OF WATER SPECTROSCOPY APPLIED ONTOLOGY Applied tasks ontology contains explicit description of input an output data of physical tasks solved in two chains of molecular spectroscopy tasks. Besides physical concepts, related to spectroscopy tasks solutions, an information source, which is a key concept of knowledge domain information model, is also used. Each solution of knowledge domain task is linked to the information source which contains a reference on published task solution, molecule and task solution properties. Each information source allows us to identify a certain knowledge domain task solution contained in the information system. Water spectroscopy applied ontology classes are formed on the basis of molecular spectroscopy concepts taxonomy. They are defined by constrains on properties of the selected conceptualization. Extension of applied ontology in W@DIS information system is actualized according to two scenarios. Individuals (ontology facts or axioms) formation is actualized during the task solution upload in the information system. Ontology user operation that implies molecular spectroscopy taxonomy and individuals is performed solely by the user. For this purpose Protege ontology editor was used. For the formation, processing and visualization of knowledge domain tasks individuals a software was designed and implemented. Method of individual formation determines the sequence of steps of created ontology individuals' generation. Tasks solutions properties (metadata) have qualitative and quantitative values. Qualitative metadata are regarded as metadata describing qualitative side of a task such as solution method or other information that can be explicitly specified by object properties of OWL DL language. Quantitative metadata are metadata that describe quantitative properties of task solution such as minimal and maximal data value or other information that can be explicitly obtained by programmed algorithmic operations. These metadata are related to DatatypeProperty properties of OWL specification language Quantitative metadata can be obtained automatically during data upload into information system. Since ObjectProperty values are objects, processing of qualitative metadata requires logical constraints. In case of the task solved in W@DIS ICS qualitative metadata can be formed automatically (for example in spectral functions calculation task). The used methods of translation of qualitative metadata into quantitative is characterized as roughened representation of knowledge in knowledge domain. The existence of two ways of data obtainment is a key moment in the formation of applied ontology of molecular spectroscopy task. experimental method (metadata for experimental data contain description of equipment, experiment conditions and so on) on the initial stage and inverse task solution on the following stages; calculation method (metadata for calculation data are closely related to the metadata used for the description of physical and mathematical models of molecular spectroscopy) 2 SOFTWARE FOR ONTOLOGY OPERATION Data collection in water spectroscopy information system is organized in a form of workflow that contains such operations as information source creation, entry of bibliographic data on publications, formation of uploaded data schema an so on. Metadata are generated in information source as well. Two methods are used for their formation: automatic metadata generation and manual metadata generation (performed by user). Software implementation of support of actions related to metadata formation is performed by META+ module. Functions of META+ module can be divided into two groups. The first groups contains the functions necessary to software developer while the second one the functions necessary to a user of the information system. META+ module functions necessary to the developer are: 1. creation of taxonomy (T-boxes) of applied ontology classes of knowledge domain tasks; 2. creation of instances of task classes; 3. creation of data schemes of tasks in a form of an XML-pattern and based on XML-syntax. XML-pattern is developed for instances generator and created according to certain rules imposed on software generator implementation. 4. implementation of metadata values calculation algorithms; 5. creation of a request interface and additional knowledge processing function for the solution of these task; 6. unification of the created functions and interfaces into one information system The following sequence is universal for the generation of task classes' individuals that form chains. Special interfaces for user operations management are designed for software developer in META+ module. There are means for qualitative metadata values updating during data reuploading to information source. The list of functions necessary to end user contains: - data sets visualization and editing, taking into account their metadata, e.g.: display of unique number of bands in transitions for a certain data source; - export of OWL/RDF models from information system to the environment in XML-syntax; - visualization of instances of classes of applied ontology tasks on molecular spectroscopy; - import of OWL/RDF models into the information system and their integration with domain vocabulary; - formation of additional knowledge of knowledge domain for the construction of ontological instances of task classes using GTML-formats and their processing; - formation of additional knowledge in knowledge domain for the construction of instances of task classes, using software algorithm for data sets processing; - function of semantic search implementation using an interface that formulates questions in a form of related triplets in order for getting an adequate answer. 3 STRUCTURE OF META+ MODULE META+ software module that provides the above functions contains the following components: - a knowledge base that stores semantic metadata and taxonomies of information system; - software libraries POWL and RAP 5 created by third-party developer and providing access to ontological storage; - function classes and libraries that form the core of the module and perform the tasks of formation, storage and visualization of classes instances; - configuration files and module patterns that allow one to adjust and organize operation of different functional blocks; META+ module also contains scripts and patterns implemented according to the rules of W@DIS information system development environment. - scripts for interaction with environment by means of the software core of information system. These scripts provide organizing web-oriented interactive communication; - patterns for the formation of functionality visualization realized by the scripts Software core of scientific information-computational system W@DIS is created with the help of MVC (Model - View - Controller) design pattern that allows us to separate logic of application from its representation. It realizes the interaction of three logical components, actualizing interactivity with the environment via Web and performing its preprocessing. Functions of «Controller» logical component are realized with the help of scripts designed according to the rules imposed by software core of the information system. Each script represents a definite object-oriented class with obligatory class method of script initiation called "start". Functions of actualization of domain application operation results representation (i.e. "View" component) are sets of HTML-patterns that allow one to visualize the results of domain applications operation with the help of additional constructions processed by software core of the system. Besides the interaction with the software core of the scientific information system this module also deals with configuration files of software core and its database. Such organization of work provides closer integration with software core and deeper and more adequate connection in operating system support. 4 CONCLUSION In this work the problems of semantic metadata creation in information system oriented on information representation in the area of molecular spectroscopy have been discussed. The described method of semantic metadata and functions formation as well as realization and structure of META+ module have been described. Architecture of META+ module is closely related to the existing software of "Molecular spectroscopy" scientific information system. Realization of the module is performed with the use of modern approaches to Web-oriented applications development. It uses the existing applied interfaces. The developed software allows us to: - perform automatic metadata annotation of calculated tasks solutions directly in the information system; - perform automatic annotation of metadata on the solution of tasks on task solution results uploading outside the information system forming an instance of the solved task on the basis of entry data; - use ontological instances of task solution for identification of data in information tasks of viewing, comparison and search solved by information system; - export applied tasks ontologies for the operation with them by external means; - solve the task of semantic search according to the pattern and using question-answer type interface. 5 ACKNOWLEDGEMENT The authors are grateful to RFBR for the financial support of development of distributed information system for molecular spectroscopy. REFERENCES A.D.Bykov, A.Z. Fazliev, N.N.Filippov, A.V. Kozodoev, A.I.Privezentsev, L.N.Sinitsa, M.V.Tonkov and M.Yu.Tretyakov, Distributed information system on atmospheric spectroscopy // Geophysical Research Abstracts, SRef-ID: 1607-7962/gra/EGU2007-A-01906, 2007, v. 9, p. 01906. A.I.Prevezentsev, A.Z. Fazliev Applied task ontology for molecular spectroscopy information resources systematization. The Proceedings of 9th Russian scientific conference "Electronic libraries: advanced methods and technologies, electronic collections" - RCDL'2007, Pereslavl Zalesskii, 2007, part.1, 2007, P.201-210. OWL Web Ontology Language Semantics and Abstract Syntax, W3C Recommendation 10 February 2004, http://www.w3.org/TR/2004/REC-owl-semantics-20040210/ W@DIS information system, http://wadis.saga.iao.ru RAP library, http://www4.wiwiss.fu-berlin.de/bizer/rdfapi/.
A document centric metadata registration tool constructing earth environmental data infrastructure

NASA Astrophysics Data System (ADS)

Ichino, M.; Kinutani, H.; Ono, M.; Shimizu, T.; Yoshikawa, M.; Masuda, K.; Fukuda, K.; Kawamoto, H.

2009-12-01

DIAS (Data Integration and Analysis System) is one of GEOSS activities in Japan. It is also a leading part of the GEOSS task with the same name defined in GEOSS Ten Year Implementation Plan. The main mission of DIAS is to construct data infrastructure that can effectively integrate earth environmental data such as observation data, numerical model outputs, and socio-economic data provided from the fields of climate, water cycle, ecosystem, ocean, biodiversity and agriculture. Some of DIAS's data products are available at the following web site of http://www.jamstec.go.jp/e/medid/dias. Most of earth environmental data commonly have spatial and temporal attributes such as the covering geographic scope or the created date. The metadata standards including these common attributes are published by the geographic information technical committee (TC211) in ISO (the International Organization for Standardization) as specifications of ISO 19115:2003 and 19139:2007. Accordingly, DIAS metadata is developed with basing on ISO/TC211 metadata standards. From the viewpoint of data users, metadata is useful not only for data retrieval and analysis but also for interoperability and information sharing among experts, beginners and nonprofessionals. On the other hand, from the viewpoint of data providers, two problems were pointed out after discussions. One is that data providers prefer to minimize another tasks and spending time for creating metadata. Another is that data providers want to manage and publish documents to explain their data sets more comprehensively. Because of solving these problems, we have been developing a document centric metadata registration tool. The features of our tool are that the generated documents are available instantly and there is no extra cost for data providers to generate metadata. Also, this tool is developed as a Web application. So, this tool does not demand any software for data providers if they have a web-browser. The interface of the tool provides the section titles of the documents and by filling out the content of each section, the documents for the data sets are automatically published in PDF and HTML format. Furthermore, the metadata XML file which is compliant with ISO19115 and ISO19139 is created at the same moment. The generated metadata are managed in the metadata database of the DIAS project, and will be used in various ISO19139 compliant metadata management tools, such as GeoNetwork.
The CMIP5 Model Documentation Questionnaire: Development of a Metadata Retrieval System for the METAFOR Common Information Model

NASA Astrophysics Data System (ADS)

Pascoe, Charlotte; Lawrence, Bryan; Moine, Marie-Pierre; Ford, Rupert; Devine, Gerry

2010-05-01

The EU METAFOR Project (http://metaforclimate.eu) has created a web-based model documentation questionnaire to collect metadata from the modelling groups that are running simulations in support of the Coupled Model Intercomparison Project - 5 (CMIP5). The CMIP5 model documentation questionnaire will retrieve information about the details of the models used, how the simulations were carried out, how the simulations conformed to the CMIP5 experiment requirements and details of the hardware used to perform the simulations. The metadata collected by the CMIP5 questionnaire will allow CMIP5 data to be compared in a scientifically meaningful way. This paper describes the life-cycle of the CMIP5 questionnaire development which starts with relatively unstructured input from domain specialists and ends with formal XML documents that comply with the METAFOR Common Information Model (CIM). Each development step is associated with a specific tool. (1) Mind maps are used to capture information requirements from domain experts and build a controlled vocabulary, (2) a python parser processes the XML files generated by the mind maps, (3) Django (python) is used to generate the dynamic structure and content of the web based questionnaire from processed xml and the METAFOR CIM, (4) Python parsers ensure that information entered into the CMIP5 questionnaire is output as CIM compliant xml, (5) CIM compliant output allows automatic information capture tools to harvest questionnaire content into databases such as the Earth System Grid (ESG) metadata catalogue. This paper will focus on how Django (python) and XML input files are used to generate the structure and content of the CMIP5 questionnaire. It will also address how the choice of development tools listed above provided a framework that enabled working scientists (who we would never ordinarily get to interact with UML and XML) to be part the iterative development process and ensure that the CMIP5 model documentation questionnaire reflects what scientists want to know about the models. Keywords: metadata, CMIP5, automatic information capture, tool development
Old document image segmentation using the autocorrelation function and multiresolution analysis

NASA Astrophysics Data System (ADS)

Mehri, Maroua; Gomez-Krämer, Petra; Héroux, Pierre; Mullot, Rémy

2013-01-01

Recent progress in the digitization of heterogeneous collections of ancient documents has rekindled new challenges in information retrieval in digital libraries and document layout analysis. Therefore, in order to control the quality of historical document image digitization and to meet the need of a characterization of their content using intermediate level metadata (between image and document structure), we propose a fast automatic layout segmentation of old document images based on five descriptors. Those descriptors, based on the autocorrelation function, are obtained by multiresolution analysis and used afterwards in a specific clustering method. The method proposed in this article has the advantage that it is performed without any hypothesis on the document structure, either about the document model (physical structure), or the typographical parameters (logical structure). It is also parameter-free since it automatically adapts to the image content. In this paper, firstly, we detail our proposal to characterize the content of old documents by extracting the autocorrelation features in the different areas of a page and at several resolutions. Then, we show that is possible to automatically find the homogeneous regions defined by similar indices of autocorrelation without knowledge about the number of clusters using adapted hierarchical ascendant classification and consensus clustering approaches. To assess our method, we apply our algorithm on 316 old document images, which encompass six centuries (1200-1900) of French history, in order to demonstrate the performance of our proposal in terms of segmentation and characterization of heterogeneous corpus content. Moreover, we define a new evaluation metric, the homogeneity measure, which aims at evaluating the segmentation and characterization accuracy of our methodology. We find a 85% of mean homogeneity accuracy. Those results help to represent a document by a hierarchy of layout structure and content, and to define one or more signatures for each page, on the basis of a hierarchical representation of homogeneous blocks and their topology.
The MPO system for automatic workflow documentation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abla, G.; Coviello, E. N.; Flanagan, S. M.

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
The MPO system for automatic workflow documentation

DOE PAGES

Abla, G.; Coviello, E. N.; Flanagan, S. M.; ...

2016-04-18

Data from large-scale experiments and extreme-scale computing is expensive to produce and may be used for critical applications. However, it is not the mere existence of data that is important, but our ability to make use of it. Experience has shown that when metadata is better organized and more complete, the underlying data becomes more useful. Traditionally, capturing the steps of scientific workflows and metadata was the role of the lab notebook, but the digital era has resulted instead in the fragmentation of data, processing, and annotation. Here, this article presents the Metadata, Provenance, and Ontology (MPO) System, the softwaremore » that can automate the documentation of scientific workflows and associated information. Based on recorded metadata, it provides explicit information about the relationships among the elements of workflows in notebook form augmented with directed acyclic graphs. A set of web-based graphical navigation tools and Application Programming Interface (API) have been created for searching and browsing, as well as programmatically accessing the workflows and data. We describe the MPO concepts and its software architecture. We also report the current status of the software as well as the initial deployment experience.« less
A metadata approach for clinical data management in translational genomics studies in breast cancer.

PubMed

Papatheodorou, Irene; Crichton, Charles; Morris, Lorna; Maccallum, Peter; Davies, Jim; Brenton, James D; Caldas, Carlos

2009-11-30

In molecular profiling studies of cancer patients, experimental and clinical data are combined in order to understand the clinical heterogeneity of the disease: clinical information for each subject needs to be linked to tumour samples, macromolecules extracted, and experimental results. This may involve the integration of clinical data sets from several different sources: these data sets may employ different data definitions and some may be incomplete. In this work we employ semantic web techniques developed within the CancerGrid project, in particular the use of metadata elements and logic-based inference to annotate heterogeneous clinical information, integrate and query it. We show how this integration can be achieved automatically, following the declaration of appropriate metadata elements for each clinical data set; we demonstrate the practicality of this approach through application to experimental results and clinical data from five hospitals in the UK and Canada, undertaken as part of the METABRIC project (Molecular Taxonomy of Breast Cancer International Consortium). We describe a metadata approach for managing similarities and differences in clinical datasets in a standardized way that uses Common Data Elements (CDEs). We apply and evaluate the approach by integrating the five different clinical datasets of METABRIC.
A System for Automated Extraction of Metadata from Scanned Documents using Layout Recognition and String Pattern Search Models

PubMed Central

Misra, Dharitri; Chen, Siyuan; Thoma, George R.

2010-01-01

One of the most expensive aspects of archiving digital documents is the manual acquisition of context-sensitive metadata useful for the subsequent discovery of, and access to, the archived items. For certain types of textual documents, such as journal articles, pamphlets, official government records, etc., where the metadata is contained within the body of the documents, a cost effective method is to identify and extract the metadata in an automated way, applying machine learning and string pattern search techniques. At the U. S. National Library of Medicine (NLM) we have developed an automated metadata extraction (AME) system that employs layout classification and recognition models with a metadata pattern search model for a text corpus with structured or semi-structured information. A combination of Support Vector Machine and Hidden Markov Model is used to create the layout recognition models from a training set of the corpus, following which a rule-based metadata search model is used to extract the embedded metadata by analyzing the string patterns within and surrounding each field in the recognized layouts. In this paper, we describe the design of our AME system, with focus on the metadata search model. We present the extraction results for a historic collection from the Food and Drug Administration, and outline how the system may be adapted for similar collections. Finally, we discuss some ongoing enhancements to our AME system. PMID:21179386
Earthquake and failure forecasting in real-time: A Forecasting Model Testing Centre

NASA Astrophysics Data System (ADS)

Filgueira, Rosa; Atkinson, Malcolm; Bell, Andrew; Main, Ian; Boon, Steven; Meredith, Philip

2013-04-01

Across Europe there are a large number of rock deformation laboratories, each of which runs many experiments. Similarly there are a large number of theoretical rock physicists who develop constitutive and computational models both for rock deformation and changes in geophysical properties. Here we consider how to open up opportunities for sharing experimental data in a way that is integrated with multiple hypothesis testing. We present a prototype for a new forecasting model testing centre based on e-infrastructures for capturing and sharing data and models to accelerate the Rock Physicist (RP) research. This proposal is triggered by our work on data assimilation in the NERC EFFORT (Earthquake and Failure Forecasting in Real Time) project, using data provided by the NERC CREEP 2 experimental project as a test case. EFFORT is a multi-disciplinary collaboration between Geoscientists, Rock Physicists and Computer Scientist. Brittle failure of the crust is likely to play a key role in controlling the timing of a range of geophysical hazards, such as volcanic eruptions, yet the predictability of brittle failure is unknown. Our aim is to provide a facility for developing and testing models to forecast brittle failure in experimental and natural data. Model testing is performed in real-time, verifiably prospective mode, in order to avoid selection biases that are possible in retrospective analyses. The project will ultimately quantify the predictability of brittle failure, and how this predictability scales from simple, controlled laboratory conditions to the complex, uncontrolled real world. Experimental data are collected from controlled laboratory experiments which includes data from the UCL Laboratory and from Creep2 project which will undertake experiments in a deep-sea laboratory. We illustrate the properties of the prototype testing centre by streaming and analysing realistically noisy synthetic data, as an aid to generating and improving testing methodologies in imperfect conditions. The forecasting model testing centre uses a repository to hold all the data and models and a catalogue to hold all the corresponding metadata. It allows to: Data transfer: Upload experimental data: We have developed FAST (Flexible Automated Streaming Transfer) tool to upload data from RP laboratories to the repository. FAST sets up data transfer requirements and selects automatically the transfer protocol. Metadata are automatically created and stored. Web data access: Create synthetic data: Users can choose a generator and supply parameters. Synthetic data are automatically stored with corresponding metadata. Select data and models: Search the metadata using criteria design for RP. The metadata of each data (synthetic or from laboratory) and models are well-described through their respective catalogues accessible by the web portal. Upload models: Upload and store a model with associated metadata. This provide an opportunity to share models. The web portal solicits and creates metadata describing each model. Run model and visualise results: Selected data and a model to be submitted to a High Performance Computational resource hiding technical details. Results are displayed in accelerated time and stored allowing retrieval, inspection and aggregation. The forecasting model testing centre proposed could be integrated into EPOS. Its expected benefits are: Improved the understanding of brittle failure prediction and its scalability to natural phenomena. Accelerated and extensive testing and rapid sharing of insights. Increased impact and visibility of RP and GeoScience research. Resources for education and training. A key challenge is to agree the framework for sharing RP data and models. Our work is provocative first step.
Creating preservation metadata from XML-metadata profiles

NASA Astrophysics Data System (ADS)

Ulbricht, Damian; Bertelmann, Roland; Gebauer, Petra; Hasler, Tim; Klump, Jens; Kirchner, Ingo; Peters-Kottig, Wolfgang; Mettig, Nora; Rusch, Beate

2014-05-01

Registration of dataset DOIs at DataCite makes research data citable and comes with the obligation to keep data accessible in the future. In addition, many universities and research institutions measure data that is unique and not repeatable like the data produced by an observational network and they want to keep these data for future generations. In consequence, such data should be ingested in preservation systems, that automatically care for file format changes. Open source preservation software that is developed along the definitions of the ISO OAIS reference model is available but during ingest of data and metadata there are still problems to be solved. File format validation is difficult, because format validators are not only remarkably slow - due to variety in file formats different validators return conflicting identification profiles for identical data. These conflicts are hard to resolve. Preservation systems have a deficit in the support of custom metadata. Furthermore, data producers are sometimes not aware that quality metadata is a key issue for the re-use of data. In the project EWIG an university institute and a research institute work together with Zuse-Institute Berlin, that is acting as an infrastructure facility, to generate exemplary workflows for research data into OAIS compliant archives with emphasis on the geosciences. The Institute for Meteorology provides timeseries data from an urban monitoring network whereas GFZ Potsdam delivers file based data from research projects. To identify problems in existing preservation workflows the technical work is complemented by interviews with data practitioners. Policies for handling data and metadata are developed. Furthermore, university teaching material is created to raise the future scientists awareness of research data management. As a testbed for ingest workflows the digital preservation system Archivematica [1] is used. During the ingest process metadata is generated that is compliant to the Metadata Encoding and Transmission Standard (METS). To find datasets in future portals and to make use of this data in own scientific work, proper selection of discovery metadata and application metadata is very important. Some XML-metadata profiles are not suitable for preservation, because version changes are very fast and make it nearly impossible to automate the migration. For other XML-metadata profiles schema definitions are changed after publication of the profile or the schema definitions become inaccessible, which might cause problems during validation of the metadata inside the preservation system [2]. Some metadata profiles are not used widely enough and might not even exist in the future. Eventually, discovery and application metadata have to be embedded into the mdWrap-subtree of the METS-XML. [1] http://www.archivematica.org [2] http://dx.doi.org/10.2218/ijdc.v7i1.215
SciFlo: Semantically-Enabled Grid Workflow for Collaborative Science

NASA Astrophysics Data System (ADS)

Yunck, T.; Wilson, B. D.; Raskin, R.; Manipon, G.

2005-12-01

SciFlo is a system for Scientific Knowledge Creation on the Grid using a Semantically-Enabled Dataflow Execution Environment. SciFlo leverages Simple Object Access Protocol (SOAP) Web Services and the Grid Computing standards (WS-* standards and the Globus Alliance toolkits), and enables scientists to do multi-instrument Earth Science by assembling reusable SOAP Services, native executables, local command-line scripts, and python codes into a distributed computing flow (a graph of operators). SciFlo's XML dataflow documents can be a mixture of concrete operators (fully bound operations) and abstract template operators (late binding via semantic lookup). All data objects and operators can be both simply typed (simple and complex types in XML schema) and semantically typed using controlled vocabularies (linked to OWL ontologies such as SWEET). By exploiting ontology-enhanced search and inference, one can discover (and automatically invoke) Web Services and operators that have been semantically labeled as performing the desired transformation, and adapt a particular invocation to the proper interface (number, types, and meaning of inputs and outputs). The SciFlo client & server engines optimize the execution of such distributed data flows and allow the user to transparently find and use datasets and operators without worrying about the actual location of the Grid resources. The scientist injects a distributed computation into the Grid by simply filling out an HTML form or directly authoring the underlying XML dataflow document, and results are returned directly to the scientist's desktop. A Visual Programming tool is also being developed, but it is not required. Once an analysis has been specified for a granule or day of data, it can be easily repeated with different control parameters and over months or years of data. SciFlo uses and preserves semantics, and also generates and infers new semantic annotations. Specifically, the SciFlo engine uses semantic metadata to understand (infer) what it is doing and potentially improve the data flow; preserves semantics by saving links to the semantics of (metadata describing) the input datasets, related datasets, and the data transformations (algorithms) used to generate downstream products; generates new metadata by allowing the user to add semantic annotations to the generated data products (or simply accept automatically generated provenance annotations); and infers new semantic metadata by understanding and applying logic to the semantics of the data and the transformations performed. Much ontology development still needs to be done but, nevertheless, SciFlo documents provide a substrate for using and preserving more semantics as ontologies develop. We will give a live demonstration of the growing SciFlo network using an example dataflow in which atmospheric temperature and water vapor profiles from three Earth Observing System (EOS) instruments are retrieved using SOAP (geo-location query & data access) services, co-registered, and visually & statistically compared on demand (see http://sciflo.jpl.nasa.gov for more information).
The ground truth about metadata and community detection in networks

PubMed Central

Peel, Leto; Larremore, Daniel B.; Clauset, Aaron

2017-01-01

Across many scientific domains, there is a common need to automatically extract a simplified view or coarse-graining of how a complex system’s components interact. This general task is called community detection in networks and is analogous to searching for clusters in independent vector data. It is common to evaluate the performance of community detection algorithms by their ability to find so-called ground truth communities. This works well in synthetic networks with planted communities because these networks’ links are formed explicitly based on those known communities. However, there are no planted communities in real-world networks. Instead, it is standard practice to treat some observed discrete-valued node attributes, or metadata, as ground truth. We show that metadata are not the same as ground truth and that treating them as such induces severe theoretical and practical problems. We prove that no algorithm can uniquely solve community detection, and we prove a general No Free Lunch theorem for community detection, which implies that there can be no algorithm that is optimal for all possible community detection tasks. However, community detection remains a powerful tool and node metadata still have value, so a careful exploration of their relationship with network structure can yield insights of genuine worth. We illustrate this point by introducing two statistical techniques that can quantify the relationship between metadata and community structure for a broad class of models. We demonstrate these techniques using both synthetic and real-world networks, and for multiple types of metadata and community structures. PMID:28508065
Adaptive pseudolinear compensators of dynamic characteristics of automatic control systems

NASA Astrophysics Data System (ADS)

Skorospeshkin, M. V.; Sukhodoev, M. S.; Timoshenko, E. A.; Lenskiy, F. V.

2016-04-01

Adaptive pseudolinear gain and phase compensators of dynamic characteristics of automatic control systems are suggested. The automatic control system performance with adaptive compensators has been explored. The efficiency of pseudolinear adaptive compensators in the automatic control systems with time-varying parameters has been demonstrated.

Collaborative Sharing of Multidimensional Space-time Data Using HydroShare

NASA Astrophysics Data System (ADS)

Gan, T.; Tarboton, D. G.; Horsburgh, J. S.; Dash, P. K.; Idaszak, R.; Yi, H.; Blanton, B.

2015-12-01

HydroShare is a collaborative environment being developed for sharing hydrological data and models. It includes capability to upload data in many formats as resources that can be shared. The HydroShare data model for resources uses a specific format for the representation of each type of data and specifies metadata common to all resource types as well as metadata unique to specific resource types. The Network Common Data Form (NetCDF) was chosen as the format for multidimensional space-time data in HydroShare. NetCDF is widely used in hydrological and other geoscience modeling because it contains self-describing metadata and supports the creation of array-oriented datasets that may include three spatial dimensions, a time dimension and other user defined dimensions. For example, NetCDF may be used to represent precipitation or surface air temperature fields that have two dimensions in space and one dimension in time. This presentation will illustrate how NetCDF files are used in HydroShare. When a NetCDF file is loaded into HydroShare, header information is extracted using the "ncdump" utility. Python functions developed for the Django web framework on which HydroShare is based, extract science metadata present in the NetCDF file, saving the user from having to enter it. Where the file follows Climate Forecast (CF) convention and Attribute Convention for Dataset Discovery (ACDD) standards, metadata is thus automatically populated. Users also have the ability to add metadata to the resource that may not have been present in the original NetCDF file. HydroShare's metadata editing functionality then writes this science metadata back into the NetCDF file to maintain consistency between the science metadata in HydroShare and the metadata in the NetCDF file. This further helps researchers easily add metadata information following the CF and ACDD conventions. Additional data inspection and subsetting functions were developed, taking advantage of Python and command line libraries for working with NetCDF files. We describe the design and implementation of these features and illustrate how NetCDF files from a modeling application may be curated in HydroShare and thus enhance reproducibility of the associated research. We also discuss future development planned for multidimensional space-time data in HydroShare.
An algebra for spatio-temporal information generation

NASA Astrophysics Data System (ADS)

Pebesma, Edzer; Scheider, Simon; Gräler, Benedikt; Stasch, Christoph; Hinz, Matthias

2016-04-01

When we accept the premises of James Frew's laws of metadata (Frew's first law: scientists don't write metadata; Frew's second law: any scientist can be forced to write bad metadata), but also assume that scientists try to maximise the impact of their research findings, can we develop our information infrastructures such that useful metadata is generated automatically? Currently, sharing of data and software to completely reproduce research findings is becoming standard, e.g. in the Journal of Statistical Software [1]. The reproduction (e.g. R) scripts however convey correct syntax, but still limited semantics. We propose [2] a new, platform-neutral way to algebraically describe how data is generated, e.g. by observation, and how data is derived, e.g. by processing observations. It starts with forming functions composed of four reference system types (space, time, quality, entity), which express for instance continuity of objects over time, and continuity of fields over space and time. Data, which is discrete by definition, is generated by evaluating such functions at discrete space and time instances, or by evaluating a convolution (aggregation) over them. Derived data is obtained by inputting data to data derivation functions, which for instance interpolate, estimate, aggregate, or convert fields into objects and vice versa. As opposed to the traditional when, where and what semantics of data sets, our algebra focuses on describing how a data set was generated. We argue that it can be used to discover data sets that were derived from a particular source x, or derived by a particular procedure y. It may also form the basis for inferring meaningfulness of derivation procedures [3]. Current research focuses on automatically generating provenance documentation from R scripts. [1] http://www.jstatsoft.org/ (open access) [2] http://www.meaningfulspatialstatistics.org has the full paper (in review) [3] Stasch, C., S. Scheider, E. Pebesma, W. Kuhn, 2014. Meaningful Spatial Prediction and Aggregation. Environmental Modelling & Software, 51, 149-165 (open access)
Metadata Repository for Improved Data Sharing and Reuse Based on HL7 FHIR.

PubMed

Ulrich, Hannes; Kock, Ann-Kristin; Duhm-Harbeck, Petra; Habermann, Jens K; Ingenerf, Josef

2016-01-01

Unreconciled data structures and formats are a common obstacle to the urgently required sharing and reuse of data within healthcare and medical research. Within the North German Tumor Bank of Colorectal Cancer, clinical and sample data, based on a harmonized data set, is collected and can be pooled by using a hospital-integrated Research Data Management System supporting biobank and study management. Adding further partners who are not using the core data set requires manual adaptations and mapping of data elements. Facing this manual intervention and focusing the reuse of heterogeneous healthcare instance data (value level) and data elements (metadata level), a metadata repository has been developed. The metadata repository is an ISO 11179-3 conformant server application built for annotating and mediating data elements. The implemented architecture includes the translation of metadata information about data elements into the FHIR standard using the FHIR Data Element resource with the ISO 11179 Data Element Extensions. The FHIR-based processing allows exchange of data elements with clinical and research IT systems as well as with other metadata systems. With increasingly annotated and harmonized data elements, data quality and integration can be improved for successfully enabling data analytics and decision support.
A standard for measuring metadata quality in spectral libraries

NASA Astrophysics Data System (ADS)

Rasaiah, B.; Jones, S. D.; Bellman, C.

2013-12-01

A standard for measuring metadata quality in spectral libraries Barbara Rasaiah, Simon Jones, Chris Bellman RMIT University Melbourne, Australia barbara.rasaiah@rmit.edu.au, simon.jones@rmit.edu.au, chris.bellman@rmit.edu.au ABSTRACT There is an urgent need within the international remote sensing community to establish a metadata standard for field spectroscopy that ensures high quality, interoperable metadata sets that can be archived and shared efficiently within Earth observation data sharing systems. Metadata are an important component in the cataloguing and analysis of in situ spectroscopy datasets because of their central role in identifying and quantifying the quality and reliability of spectral data and the products derived from them. This paper presents approaches to measuring metadata completeness and quality in spectral libraries to determine reliability, interoperability, and re-useability of a dataset. Explored are quality parameters that meet the unique requirements of in situ spectroscopy datasets, across many campaigns. Examined are the challenges presented by ensuring that data creators, owners, and data users ensure a high level of data integrity throughout the lifecycle of a dataset. Issues such as field measurement methods, instrument calibration, and data representativeness are investigated. The proposed metadata standard incorporates expert recommendations that include metadata protocols critical to all campaigns, and those that are restricted to campaigns for specific target measurements. The implication of semantics and syntax for a robust and flexible metadata standard are also considered. Approaches towards an operational and logistically viable implementation of a quality standard are discussed. This paper also proposes a way forward for adapting and enhancing current geospatial metadata standards to the unique requirements of field spectroscopy metadata quality. [0430] BIOGEOSCIENCES / Computational methods and data processing [0480] BIOGEOSCIENCES / Remote sensing [1904] INFORMATICS / Community standards [1912] INFORMATICS / Data management, preservation, rescue [1926] INFORMATICS / Geospatial [1930] INFORMATICS / Data and information governance [1946] INFORMATICS / Metadata [1952] INFORMATICS / Modeling [1976] INFORMATICS / Software tools and services [9810] GENERAL OR MISCELLANEOUS / New fields
A Comparison of Educational Statistics and Data Mining Approaches to Identify Characteristics That Impact Online Learning

ERIC Educational Resources Information Center

Miller, L. Dee; Soh, Leen-Kiat; Samal, Ashok; Kupzyk, Kevin; Nugent, Gwen

2015-01-01

Learning objects (LOs) are important online resources for both learners and instructors and usage for LOs is growing. Automatic LO tracking collects large amounts of metadata about individual students as well as data aggregated across courses, learning objects, and other demographic characteristics (e.g. gender). The challenge becomes identifying…
Handling Metadata in a Neurophysiology Laboratory

PubMed Central

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G.; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework. PMID:27486397
Handling Metadata in a Neurophysiology Laboratory.

PubMed

Zehl, Lyuba; Jaillet, Florent; Stoewer, Adrian; Grewe, Jan; Sobolev, Andrey; Wachtler, Thomas; Brochier, Thomas G; Riehle, Alexa; Denker, Michael; Grün, Sonja

2016-01-01

To date, non-reproducibility of neurophysiological research is a matter of intense discussion in the scientific community. A crucial component to enhance reproducibility is to comprehensively collect and store metadata, that is, all information about the experiment, the data, and the applied preprocessing steps on the data, such that they can be accessed and shared in a consistent and simple manner. However, the complexity of experiments, the highly specialized analysis workflows and a lack of knowledge on how to make use of supporting software tools often overburden researchers to perform such a detailed documentation. For this reason, the collected metadata are often incomplete, incomprehensible for outsiders or ambiguous. Based on our research experience in dealing with diverse datasets, we here provide conceptual and technical guidance to overcome the challenges associated with the collection, organization, and storage of metadata in a neurophysiology laboratory. Through the concrete example of managing the metadata of a complex experiment that yields multi-channel recordings from monkeys performing a behavioral motor task, we practically demonstrate the implementation of these approaches and solutions with the intention that they may be generalized to other projects. Moreover, we detail five use cases that demonstrate the resulting benefits of constructing a well-organized metadata collection when processing or analyzing the recorded data, in particular when these are shared between laboratories in a modern scientific collaboration. Finally, we suggest an adaptable workflow to accumulate, structure and store metadata from different sources using, by way of example, the odML metadata framework.
The TDR: A Repository for Long Term Storage of Geophysical Data and Metadata

NASA Astrophysics Data System (ADS)

Wilson, A.; Baltzer, T.; Caron, J.

2006-12-01

For many years Unidata has provided easy, low cost data access to universities and research labs. Historically Unidata technology provided access to data in near real time. In recent years Unidata has additionally turned to providing middleware to serve longer term data and associated metadata via its THREDDS technology, the most recent offering being the THREDDS Data Server (TDS). The TDS provides middleware for metadata access and management, OPeNDAP data access, and integration with the Unidata Integrated Data Viewer (IDV), among other benefits. The TDS was designed to support rolling archives of data, that is, data that exist only for a relatively short, predefined time window. Now we are creating an addition to the TDS, called the THREDDS Data Repository (TDR), which allows users to store and retrieve data and other objects for an arbitrarily long time period. Data in the TDR can also be served by the TDS. The TDR performs important functions of locating storage for the data, moving the data to and from the repository, assigning unique identifiers, and generating metadata. The TDR framework supports pluggable components that allow tailoring an implementation for a particular application. The Linked Environments for Atmospheric Discovery (LEAD) project provides an excellent use case for the TDR. LEAD is a multi-institutional Large Information Technology Research project funded by the National Science Foundation (NSF). The goal of LEAD is to create a framework based on Grid and Web Services to support mesoscale meteorology research and education. This includes capabilities such as launching forecast models, mining data for meteorological phenomena, and dynamic workflows that are automatically reconfigurable in response to changing weather. LEAD presents unique challenges in managing and storing large data volumes from real-time observational systems as well as data that are dynamically created during the execution of adaptive workflows. For example, in order to support storage of many large data products, the LEAD implementation of the TDR will provide a variety of data movement options, including gridftp. It will have a web service interface and will be callable programmatically as well as via interactive user requests. Future plans include the use of a mass storage device to provide robust long term storage. This talk will present the current state of the TDR effort.
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data

PubMed Central

Larralde, Martin; Lawson, Thomas N.; Weber, Ralf J. M.; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R.; Steinbeck, Christoph; Salek, Reza M.

2017-01-01

Abstract Summary Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. Availability and Implementation mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. Contact reza.salek@ebi.ac.uk or isatools@googlegroups.com Supplementary information Supplementary data are available at Bioinformatics online. PMID:28402395
mzML2ISA & nmrML2ISA: generating enriched ISA-Tab metadata files from metabolomics XML data.

PubMed

Larralde, Martin; Lawson, Thomas N; Weber, Ralf J M; Moreno, Pablo; Haug, Kenneth; Rocca-Serra, Philippe; Viant, Mark R; Steinbeck, Christoph; Salek, Reza M

2017-08-15

Submission to the MetaboLights repository for metabolomics data currently places the burden of reporting instrument and acquisition parameters in ISA-Tab format on users, who have to do it manually, a process that is time consuming and prone to user input error. Since the large majority of these parameters are embedded in instrument raw data files, an opportunity exists to capture this metadata more accurately. Here we report a set of Python packages that can automatically generate ISA-Tab metadata file stubs from raw XML metabolomics data files. The parsing packages are separated into mzML2ISA (encompassing mzML and imzML formats) and nmrML2ISA (nmrML format only). Overall, the use of mzML2ISA & nmrML2ISA reduces the time needed to capture metadata substantially (capturing 90% of metadata on assay and sample levels), is much less prone to user input errors, improves compliance with minimum information reporting guidelines and facilitates more finely grained data exploration and querying of datasets. mzML2ISA & nmrML2ISA are available under version 3 of the GNU General Public Licence at https://github.com/ISA-tools. Documentation is available from http://2isa.readthedocs.io/en/latest/. reza.salek@ebi.ac.uk or isatools@googlegroups.com. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
docBUILDER - Building Your Useful Metadata for Earth Science Data and Services.

NASA Astrophysics Data System (ADS)

Weir, H. M.; Pollack, J.; Olsen, L. M.; Major, G. R.

2005-12-01

The docBUILDER tool, created by NASA's Global Change Master Directory (GCMD), assists the scientific community in efficiently creating quality data and services metadata. Metadata authors are asked to complete five required fields to ensure enough information is provided for users to discover the data and related services they seek. After the metadata record is submitted to the GCMD, it is reviewed for semantic and syntactic consistency. Currently, two versions are available - a Web-based tool accessible with most browsers (docBUILDERweb) and a stand-alone desktop application (docBUILDERsolo). The Web version is available through the GCMD website, at http://gcmd.nasa.gov/User/authoring.html. This version has been updated and now offers: personalized templates to ease entering similar information for multiple data sets/services; automatic population of Data Center/Service Provider URLs based on the selected center/provider; three-color support to indicate required, recommended, and optional fields; an editable text window containing the XML record, to allow for quick editing; and improved overall performance and presentation. The docBUILDERsolo version offers the ability to create metadata records on a computer wherever you are. Except for installation and the occasional update of keywords, data/service providers are not required to have an Internet connection. This freedom will allow users with portable computers (Windows, Mac, and Linux) to create records in field campaigns, whether in Antarctica or the Australian Outback. This version also offers a spell-checker, in addition to all of the features found in the Web version.
A rapid prototyping/artificial intelligence approach to space station-era information management and access

NASA Technical Reports Server (NTRS)

Carnahan, Richard S., Jr.; Corey, Stephen M.; Snow, John B.

1989-01-01

Applications of rapid prototyping and Artificial Intelligence techniques to problems associated with Space Station-era information management systems are described. In particular, the work is centered on issues related to: (1) intelligent man-machine interfaces applied to scientific data user support, and (2) the requirement that intelligent information management systems (IIMS) be able to efficiently process metadata updates concerning types of data handled. The advanced IIMS represents functional capabilities driven almost entirely by the needs of potential users. Space Station-era scientific data projected to be generated is likely to be significantly greater than data currently processed and analyzed. Information about scientific data must be presented clearly, concisely, and with support features to allow users at all levels of expertise efficient and cost-effective data access. Additionally, mechanisms for allowing more efficient IIMS metadata update processes must be addressed. The work reported covers the following IIMS design aspects: IIMS data and metadata modeling, including the automatic updating of IIMS-contained metadata, IIMS user-system interface considerations, including significant problems associated with remote access, user profiles, and on-line tutorial capabilities, and development of an IIMS query and browse facility, including the capability to deal with spatial information. A working prototype has been developed and is being enhanced.
Java Library for Input and Output of Image Data and Metadata

NASA Technical Reports Server (NTRS)

Deen, Robert; Levoe, Steven

2003-01-01

A Java-language library supports input and output (I/O) of image data and metadata (label data) in the format of the Video Image Communication and Retrieval (VICAR) image-processing software and in several similar formats, including a subset of the Planetary Data System (PDS) image file format. The library does the following: It provides low-level, direct access layer, enabling an application subprogram to read and write specific image files, lines, or pixels, and manipulate metadata directly. Two coding/decoding subprograms ("codecs" for short) based on the Java Advanced Imaging (JAI) software provide access to VICAR and PDS images in a file-format-independent manner. The VICAR and PDS codecs enable any program that conforms to the specification of the JAI codec to use VICAR or PDS images automatically, without specific knowledge of the VICAR or PDS format. The library also includes Image I/O plugin subprograms for VICAR and PDS formats. Application programs that conform to the Image I/O specification of Java version 1.4 can utilize any image format for which such a plug-in subprogram exists, without specific knowledge of the format itself. Like the aforementioned codecs, the VICAR and PDS Image I/O plug-in subprograms support reading and writing of metadata.
MPEG-7-based description infrastructure for an audiovisual content analysis and retrieval system

NASA Astrophysics Data System (ADS)

Bailer, Werner; Schallauer, Peter; Hausenblas, Michael; Thallinger, Georg

2005-01-01

We present a case study of establishing a description infrastructure for an audiovisual content-analysis and retrieval system. The description infrastructure consists of an internal metadata model and access tool for using it. Based on an analysis of requirements, we have selected, out of a set of candidates, MPEG-7 as the basis of our metadata model. The openness and generality of MPEG-7 allow using it in broad range of applications, but increase complexity and hinder interoperability. Profiling has been proposed as a solution, with the focus on selecting and constraining description tools. Semantic constraints are currently only described in textual form. Conformance in terms of semantics can thus not be evaluated automatically and mappings between different profiles can only be defined manually. As a solution, we propose an approach to formalize the semantic constraints of an MPEG-7 profile using a formal vocabulary expressed in OWL, which allows automated processing of semantic constraints. We have defined the Detailed Audiovisual Profile as the profile to be used in our metadata model and we show how some of the semantic constraints of this profile can be formulated using ontologies. To work practically with the metadata model, we have implemented a MPEG-7 library and a client/server document access infrastructure.
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE PAGES

Dasari, Venkat R.; Humble, Travis S.

2016-10-10

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
OpenFlow arbitrated programmable network channels for managing quantum metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dasari, Venkat R.; Humble, Travis S.

Quantum networks must classically exchange complex metadata between devices in order to carry out information for protocols such as teleportation, super-dense coding, and quantum key distribution. Demonstrating the integration of these new communication methods with existing network protocols, channels, and data forwarding mechanisms remains an open challenge. Software-defined networking (SDN) offers robust and flexible strategies for managing diverse network devices and uses. We adapt the principles of SDN to the deployment of quantum networks, which are composed from unique devices that operate according to the laws of quantum mechanics. We show how quantum metadata can be managed within a software-definedmore » network using the OpenFlow protocol, and we describe how OpenFlow management of classical optical channels is compatible with emerging quantum communication protocols. We next give an example specification of the metadata needed to manage and control quantum physical layer (QPHY) behavior and we extend the OpenFlow interface to accommodate this quantum metadata. Here, we conclude by discussing near-term experimental efforts that can realize SDN’s principles for quantum communication.« less
A Generic Metadata Editor Supporting System Using Drupal CMS

NASA Astrophysics Data System (ADS)

Pan, J.; Banks, N. G.; Leggott, M.

2011-12-01

Metadata handling is a key factor in preserving and reusing scientific data. In recent years, standardized structural metadata has become widely used in Geoscience communities. However, there exist many different standards in Geosciences, such as the current version of the Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata (FGDC CSDGM), the Ecological Markup Language (EML), the Geography Markup Language (GML), and the emerging ISO 19115 and related standards. In addition, there are many different subsets within the Geoscience subdomain such as the Biological Profile of the FGDC (CSDGM), or for geopolitical regions, such as the European Profile or the North American Profile in the ISO standards. It is therefore desirable to have a software foundation to support metadata creation and editing for multiple standards and profiles, without re-inventing the wheels. We have developed a software module as a generic, flexible software system to do just that: to facilitate the support for multiple metadata standards and profiles. The software consists of a set of modules for the Drupal Content Management System (CMS), with minimal inter-dependencies to other Drupal modules. There are two steps in using the system's metadata functions. First, an administrator can use the system to design a user form, based on an XML schema and its instances. The form definition is named and stored in the Drupal database as a XML blob content. Second, users in an editor role can then use the persisted XML definition to render an actual metadata entry form, for creating or editing a metadata record. Behind the scenes, the form definition XML is transformed into a PHP array, which is then rendered via Drupal Form API. When the form is submitted the posted values are used to modify a metadata record. Drupal hooks can be used to perform custom processing on metadata record before and after submission. It is trivial to store the metadata record as an actual XML file or in a storage/archive system. We are working on adding many features to help editor users, such as auto completion, pre-populating of forms, partial saving, as well as automatic schema validation. In this presentation we will demonstrate a few sample editors, including an FGDC editor and a bare bone editor for ISO 19115/19139. We will also demonstrate the use of templates during the definition phase, with the support of export and import functions. Form pre-population and input validation will also be covered. Theses modules are available as open-source software from the Islandora software foundation, as a component of a larger Drupal-based data archive system. They can be easily installed as stand-alone system, or to be plugged into other existing metadata platforms.
Improving Metadata Compliance for Earth Science Data Records

NASA Astrophysics Data System (ADS)

Armstrong, E. M.; Chang, O.; Foster, D.

2014-12-01

One of the recurring challenges of creating earth science data records is to ensure a consistent level of metadata compliance at the granule level where important details of contents, provenance, producer, and data references are necessary to obtain a sufficient level of understanding. These details are important not just for individual data consumers but also for autonomous software systems. Two of the most popular metadata standards at the granule level are the Climate and Forecast (CF) Metadata Conventions and the Attribute Conventions for Dataset Discovery (ACDD). Many data producers have implemented one or both of these models including the Group for High Resolution Sea Surface Temperature (GHRSST) for their global SST products and the Ocean Biology Processing Group for NASA ocean color and SST products. While both the CF and ACDD models contain various level of metadata richness, the actual "required" attributes are quite small in number. Metadata at the granule level becomes much more useful when recommended or optional attributes are implemented that document spatial and temporal ranges, lineage and provenance, sources, keywords, and references etc. In this presentation we report on a new open source tool to check the compliance of netCDF and HDF5 granules to the CF and ACCD metadata models. The tool, written in Python, was originally implemented to support metadata compliance for netCDF records as part of the NOAA's Integrated Ocean Observing System. It outputs standardized scoring for metadata compliance for both CF and ACDD, produces an objective summary weight, and can be implemented for remote records via OPeNDAP calls. Originally a command-line tool, we have extended it to provide a user-friendly web interface. Reports on metadata testing are grouped in hierarchies that make it easier to track flaws and inconsistencies in the record. We have also extended it to support explicit metadata structures and semantic syntax for the GHRSST project that can be easily adapted to other satellite missions as well. Overall, we hope this tool will provide the community with a useful mechanism to improve metadata quality and consistency at the granule level by providing objective scoring and assessment, as well as encourage data producers to improve metadata quality and quantity.
Automatic textual annotation of video news based on semantic visual object extraction

NASA Astrophysics Data System (ADS)

Boujemaa, Nozha; Fleuret, Francois; Gouet, Valerie; Sahbi, Hichem

2003-12-01

In this paper, we present our work for automatic generation of textual metadata based on visual content analysis of video news. We present two methods for semantic object detection and recognition from a cross modal image-text thesaurus. These thesaurus represent a supervised association between models and semantic labels. This paper is concerned with two semantic objects: faces and Tv logos. In the first part, we present our work for efficient face detection and recogniton with automatic name generation. This method allows us also to suggest the textual annotation of shots close-up estimation. On the other hand, we were interested to automatically detect and recognize different Tv logos present on incoming different news from different Tv Channels. This work was done jointly with the French Tv Channel TF1 within the "MediaWorks" project that consists on an hybrid text-image indexing and retrieval plateform for video news.
Migration of the ATLAS Metadata Interface (AMI) to Web 2.0 and cloud

NASA Astrophysics Data System (ADS)

Odier, J.; Albrand, S.; Fulachier, J.; Lambert, F.

2015-12-01

The ATLAS Metadata Interface (AMI), a mature application of more than 10 years of existence, is currently under adaptation to some recently available technologies. The web interfaces, which previously manipulated XML documents using XSL transformations, are being migrated to Asynchronous JavaScript (AJAX). Web development is considerably simplified by the introduction of a framework based on JQuery and Twitter Bootstrap. Finally, the AMI services are being migrated to an OpenStack cloud infrastructure.

Towards a semantic web of paleoclimatology

NASA Astrophysics Data System (ADS)

Emile-Geay, J.; Eshleman, J. A.

2012-12-01

The paleoclimate record is information-rich, yet signifiant technical barriers currently exist before it can be used to automatically answer scientific questions. Here we make the case for a universal format to structure paleoclimate data. A simple example demonstrates the scientific utility of such a self-contained way of organizing coral data and meta-data in the Matlab language. This example is generalized to a universal ontology that may form the backbone of an open-source, open-access and crowd-sourced paleoclimate database. Its key attributes are: 1. Parsability: the format is self-contained (hence machine-readable), and would therefore enable a semantic web of paleoclimate information. 2. Universality: the format is platform-independent (readable on all computer and operating systems), and language- independent (readable in major programming languages) 3. Extensibility: the format requires a minimum set of fields to appropriately define a paleoclimate record, but allows for the database to grow organically as more records are added, or - equally important - as more metadata are added to existing records. 4. Citability: The format enables the automatic citation of peer- reviewed articles as well as data citations whenever a data record is being used for analysis, making due recognition of scientific work an automatic part and foundational principle of paleoclimate data analysis. 5. Ergonomy: The format will be easy to use, update and manage. This structure is designed to enable semantic searches, and is expected to help accelerate discovery in all workflows where paleoclimate data are being used. Practical steps towards the implementation of such a system at the community level are then discussed.; Preliminary ontology describing relationships between the data and meta-data fields of the Nurhati et al. [2011] climate record. Several fields are viewed as instances of larger classes (ProxyClass,Site,Reference), which would allow computers to perform operations on all records within a specific class (e.g. if the measurement type is δ18O , or if the proxy class is 'Tree Ring Width', or if the resolution is less than 3 months, etc). All records in such a database would be bound to each other by similar links, allowing machines to automatically process any form of query involving existing information. Such a design would also allow growth, by adding records and/or additional information about each record.
Real-time high-level video understanding using data warehouse

NASA Astrophysics Data System (ADS)

Lienard, Bruno; Desurmont, Xavier; Barrie, Bertrand; Delaigle, Jean-Francois

2006-02-01

High-level Video content analysis such as video-surveillance is often limited by computational aspects of automatic image understanding, i.e. it requires huge computing resources for reasoning processes like categorization and huge amount of data to represent knowledge of objects, scenarios and other models. This article explains how to design and develop a "near real-time adaptive image datamart", used, as a decisional support system for vision algorithms, and then as a mass storage system. Using RDF specification as storing format of vision algorithms meta-data, we can optimise the data warehouse concepts for video analysis, add some processes able to adapt the current model and pre-process data to speed-up queries. In this way, when new data is sent from a sensor to the data warehouse for long term storage, using remote procedure call embedded in object-oriented interfaces to simplified queries, they are processed and in memory data-model is updated. After some processing, possible interpretations of this data can be returned back to the sensor. To demonstrate this new approach, we will present typical scenarios applied to this architecture such as people tracking and events detection in a multi-camera network. Finally we will show how this system becomes a high-semantic data container for external data-mining.
Textural-Contextual Labeling and Metadata Generation for Remote Sensing Applications

NASA Technical Reports Server (NTRS)

Kiang, Richard K.

1999-01-01

Despite the extensive research and the advent of several new information technologies in the last three decades, machine labeling of ground categories using remotely sensed data has not become a routine process. Considerable amount of human intervention is needed to achieve a level of acceptable labeling accuracy. A number of fundamental reasons may explain why machine labeling has not become automatic. In addition, there may be shortcomings in the methodology for labeling ground categories. The spatial information of a pixel, whether textural or contextual, relates a pixel to its surroundings. This information should be utilized to improve the performance of machine labeling of ground categories. Landsat-4 Thematic Mapper (TM) data taken in July 1982 over an area in the vicinity of Washington, D.C. are used in this study. On-line texture extraction by neural networks may not be the most efficient way to incorporate textural information into the labeling process. Texture features are pre-computed from cooccurrence matrices and then combined with a pixel's spectral and contextual information as the input to a neural network. The improvement in labeling accuracy with spatial information included is significant. The prospect of automatic generation of metadata consisting of ground categories, textural and contextual information is discussed.
Automated sea floor extraction from underwater video

NASA Astrophysics Data System (ADS)

Kelly, Lauren; Rahmes, Mark; Stiver, James; McCluskey, Mike

2016-05-01

Ocean floor mapping using video is a method to simply and cost-effectively record large areas of the seafloor. Obtaining visual and elevation models has noteworthy applications in search and recovery missions. Hazards to navigation are abundant and pose a significant threat to the safety, effectiveness, and speed of naval operations and commercial vessels. This project's objective was to develop a workflow to automatically extract metadata from marine video and create image optical and elevation surface mosaics. Three developments made this possible. First, optical character recognition (OCR) by means of two-dimensional correlation, using a known character set, allowed for the capture of metadata from image files. Second, exploiting the image metadata (i.e., latitude, longitude, heading, camera angle, and depth readings) allowed for the determination of location and orientation of the image frame in mosaic. Image registration improved the accuracy of mosaicking. Finally, overlapping data allowed us to determine height information. A disparity map was created using the parallax from overlapping viewpoints of a given area and the relative height data was utilized to create a three-dimensional, textured elevation map.
Sequencing Data Discovery and Integration for Earth System Science with MetaSeek

NASA Astrophysics Data System (ADS)

Hoarfrost, A.; Brown, N.; Arnosti, C.

2017-12-01

Microbial communities play a central role in biogeochemical cycles. Sequencing data resources from environmental sources have grown exponentially in recent years, and represent a singular opportunity to investigate microbial interactions with Earth system processes. Carrying out such meta-analyses depends on our ability to discover and curate sequencing data into large-scale integrated datasets. However, such integration efforts are currently challenging and time-consuming, with sequencing data scattered across multiple repositories and metadata that is not easily or comprehensively searchable. MetaSeek is a sequencing data discovery tool that integrates sequencing metadata from all the major data repositories, allowing the user to search and filter on datasets in a lightweight application with an intuitive, easy-to-use web-based interface. Users can save and share curated datasets, while other users can browse these data integrations or use them as a jumping off point for their own curation. Missing and/or erroneous metadata are inferred automatically where possible, and where not possible, users are prompted to contribute to the improvement of the sequencing metadata pool by correcting and amending metadata errors. Once an integrated dataset has been curated, users can follow simple instructions to download their raw data and quickly begin their investigations. In addition to the online interface, the MetaSeek database is easily queryable via an open API, further enabling users and facilitating integrations of MetaSeek with other data curation tools. This tool lowers the barriers to curation and integration of environmental sequencing data, clearing the path forward to illuminating the ecosystem-scale interactions between biological and abiotic processes.
Cyberinfrastructure for Atmospheric Discovery

NASA Astrophysics Data System (ADS)

Wilhelmson, R.; Moore, C. W.

2004-12-01

Each year across the United States, floods, tornadoes, hail, strong winds, lightning, hurricanes, and winter storms cause hundreds of deaths, routinely disrupt transportation and commerce, and result in billions of dollars in annual economic losses . MEAD and LEAD are two recent efforts aimed at developing the cyberinfrastructure for studying and forecasting these events through collection, integration, and analysis of observational data coupled with numerical simulation, data mining, and visualization. MEAD (Modeling Environment for Atmospheric Discovery) has been funded for two years as an NCSA (National Center for Supercomputing Applications) Alliance Expedition. The goal of this expedition has been the development/adaptation of cyberinfrastructure that will enable research simulations, datamining, machine learning and visualization of hurricanes and storms utilizing the high performance computing environments including the TeraGrid. Portal grid and web infrastructure are being tested that will enable launching of hundreds of individual WRF (Weather Research and Forecasting) simulations. In a similar way, multiple Regional Ocean Modeling System (ROMS) or WRF/ROMS simulations can be carried out. Metadata and the resulting large volumes of data will then be made available for further study and for educational purposes using analysis, mining, and visualization services. Initial coupling of the ROMS and WRF codes has been completed and parallel I/O is being implemented for these models. Management of these activities (services) are being enabled through Grid workflow technologies (e.g. OGCE). LEAD (Linked Environments for Atmospheric Discovery) is a recently funded 5-year, large NSF ITR grant that involves 9 institutions who are developing a comprehensive national cyberinfrastructure in mesoscale meteorology, particularly one that can interoperate with others being developed. LEAD is addressing the fundamental information technology (IT) research challenges needed to create an integrated, scalable for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing a broad array of meteorological data and model output, independent of format and physical location. A transforming element of LEAD is Workflow Orchestration for On-Demand, Real-Time, Dynamically-Adaptive Systems (WOORDS), which allows the use of analysis tools, forecast models, and data repositories as dynamically adaptive, on-demand, Grid-enabled systems that can a) change configuration rapidly and automatically in response to weather; b) continually be steered by new data; c) respond to decision-driven inputs from users; d) initiate other processes automatically; and e) steer remote observing technologies to optimize data collection for the problem at hand. Although LEAD efforts are primiarly directed at mesoscale meteorology, the IT services being developed has general applicability to other geoscience and environmental science. Integration of traditional and new data sources is a crucial component in LEAD for data analysis and assimilation, for integration of (ensemble mining) of data from sets of simulations, and for comparing results to observational data. As part of the integration effort, LEAD is creating a myLEAD metadata catalog service: a personal metacatalog that extends the Globus MCS system and is built on top of the OGSA-DAI system developed at the National e-Science Center in Edinburgh, Scotland.
The Materials Data Facility: Data Services to Advance Materials Science Research

NASA Astrophysics Data System (ADS)

Blaiszik, B.; Chard, K.; Pruyne, J.; Ananthakrishnan, R.; Tuecke, S.; Foster, I.

2016-08-01

With increasingly strict data management requirements from funding agencies and institutions, expanding focus on the challenges of research replicability, and growing data sizes and heterogeneity, new data needs are emerging in the materials community. The materials data facility (MDF) operates two cloud-hosted services, data publication and data discovery, with features to promote open data sharing, self-service data publication and curation, and encourage data reuse, layered with powerful data discovery tools. The data publication service simplifies the process of copying data to a secure storage location, assigning data a citable persistent identifier, and recording custom (e.g., material, technique, or instrument specific) and automatically-extracted metadata in a registry while the data discovery service will provide advanced search capabilities (e.g., faceting, free text range querying, and full text search) against the registered data and metadata. The MDF services empower individual researchers, research projects, and institutions to (I) publish research datasets, regardless of size, from local storage, institutional data stores, or cloud storage, without involvement of third-party publishers; (II) build, share, and enforce extensible domain-specific custom metadata schemas; (III) interact with published data and metadata via representational state transfer (REST) application program interfaces (APIs) to facilitate automation, analysis, and feedback; and (IV) access a data discovery model that allows researchers to search, interrogate, and eventually build on existing published data. We describe MDF's design, current status, and future plans.
Current Development at the Southern California Earthquake Data Center (SCEDC)

NASA Astrophysics Data System (ADS)

Appel, V. L.; Clayton, R. W.

2005-12-01

Over the past year, the SCEDC completed or is near completion of three featured projects: Station Information System (SIS) Development: The SIS will provide users with an interface into complete and accurate station metadata for all current and historic data at the SCEDC. The goal of this project is to develop a system that can interact with a single database source to enter, update and retrieve station metadata easily and efficiently. The system will provide accurate station/channel information for active stations to the SCSN real-time processing system, as will as station/channel information for stations that have parametric data at the SCEDC i.e., for users retrieving data via STP. Additionally, the SIS will supply information required to generate dataless SEED and COSMOS V0 volumes and allow stations to be added to the system with a minimum, but incomplete set of information using predefined defaults that can be easily updated as more information becomes available. Finally, the system will facilitate statewide metadata exchange for both real-time processing and provide a common approach to CISN historic station metadata. Moment Tensor Solutions: The SCEDC is currently archiving and delivering Moment Magnitudes and Moment Tensor Solutions (MTS) produced by the SCSN in real-time and post-processing solutions for events spanning back to 1999. The automatic MTS runs on all local events with magnitudes > 3.0, and all regional events > 3.5. The distributed solution automatically creates links from all USGS Simpson Maps to a text e-mail summary solution, creates a .gif image of the solution, and updates the moment tensor database tables at the SCEDC. Searchable Scanned Waveforms Site: The Caltech Seismological Lab has made available 12,223 scanned images of pre-digital analog recordings of major earthquakes recorded in Southern California between 1962 and 1992 at http://www.data.scec.org/research/scans/. The SCEDC has developed a searchable web interface that allows users to search the available files, select multiple files for download and then retrieve a zipped file containing the results. Scanned images of paper records for M>3.5 southern California earthquakes and several significant teleseisms are available for download via the SCEDC through this search tool.
The French initiative for scientific cores virtual curating : a user-oriented integrated approach

NASA Astrophysics Data System (ADS)

Pignol, Cécile; Godinho, Elodie; Galabertier, Bruno; Caillo, Arnaud; Bernardet, Karim; Augustin, Laurent; Crouzet, Christian; Billy, Isabelle; Teste, Gregory; Moreno, Eva; Tosello, Vanessa; Crosta, Xavier; Chappellaz, Jérome; Calzas, Michel; Rousseau, Denis-Didier; Arnaud, Fabien

2016-04-01

Managing scientific data is probably one the most challenging issue in modern science. The question is made even more sensitive with the need of preserving and managing high value fragile geological sam-ples: cores. Large international scientific programs, such as IODP or ICDP are leading an intense effort to solve this problem and propose detailed high standard work- and dataflows thorough core handling and curating. However most results derived from rather small-scale research programs in which data and sample management is generally managed only locally - when it is … The national excellence equipment program (Equipex) CLIMCOR aims at developing French facilities for coring and drilling investigations. It concerns indiscriminately ice, marine and continental samples. As part of this initiative, we initiated a reflexion about core curating and associated coring-data management. The aim of the project is to conserve all metadata from fieldwork in an integrated cyber-environment which will evolve toward laboratory-acquired data storage in a near future. In that aim, our demarche was conducted through an close relationship with field operators as well laboratory core curators in order to propose user-oriented solutions. The national core curating initiative currently proposes a single web portal in which all scientifics teams can store their field data. For legacy samples, this will requires the establishment of a dedicated core lists with associated metadata. For forthcoming samples, we propose a mobile application, under Android environment to capture technical and scientific metadata on the field. This application is linked with a unique coring tools library and is adapted to most coring devices (gravity, drilling, percussion, etc...) including multiple sections and holes coring operations. Those field data can be uploaded automatically to the national portal, but also referenced through international standards or persistent identifiers (IGSN, ORCID and INSPIRE) and displayed in international portals (currently, NOAA's IMLGS). In this paper, we present the architecture of the integrated system, future perspectives and the approach we adopted to reach our goals. We will also present in front of our poster, one of the three mobile applications, dedicated more particularly to the operations of continental drillings.
NCPP's Use of Standard Metadata to Promote Open and Transparent Climate Modeling

NASA Astrophysics Data System (ADS)

Treshansky, A.; Barsugli, J. J.; Guentchev, G.; Rood, R. B.; DeLuca, C.

2012-12-01

The National Climate Predictions and Projections (NCPP) Platform is developing comprehensive regional and local information about the evolving climate to inform decision making and adaptation planning. This includes both creating and providing tools to create metadata about the models and processes used to create its derived data products. NCPP is using the Common Information Model (CIM), an ontology developed by a broad set of international partners in climate research, as its metadata language. This use of a standard ensures interoperability within the climate community as well as permitting access to the ecosystem of tools and services emerging alongside the CIM. The CIM itself is divided into a general-purpose (UML & XML) schema which structures metadata documents, and a project or community-specific (XML) Controlled Vocabulary (CV) which constraints the content of metadata documents. NCPP has already modified the CIM Schema to accommodate downscaling models, simulations, and experiments. NCPP is currently developing a CV for use by the downscaling community. Incorporating downscaling into the CIM will lead to several benefits: easy access to the existing CIM Documents describing CMIP5 models and simulations that are being downscaled, access to software tools that have been developed in order to search, manipulate, and visualize CIM metadata, and coordination with national and international efforts such as ES-DOC that are working to make climate model descriptions and datasets interoperable. Providing detailed metadata descriptions which include the full provenance of derived data products will contribute to making that data (and, the models and processes which generated that data) more open and transparent to the user community.
Long-term Science Data Curation Using a Digital Object Model and Open-Source Frameworks

NASA Astrophysics Data System (ADS)

Pan, J.; Lenhardt, W.; Wilson, B. E.; Palanisamy, G.; Cook, R. B.

2010-12-01

Scientific digital content, including Earth Science observations and model output, has become more heterogeneous in format and more distributed across the Internet. In addition, data and metadata are becoming necessarily linked internally and externally on the Web. As a result, such content has become more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content consistently. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all. We investigate the use of several open-source software frameworks to address these challenges. We use Fedora Commons Framework and its digital object abstraction as the repository, Drupal CMS as the user-interface, and the Islandora module as the connector from Drupal to Fedora Repository. With the digital object model, metadata of data description and data provenance can be associated with data content in a formal manner, so are external references and other arbitrary auxiliary information. Changes are formally audited on an object, and digital contents are versioned and have checksums automatically computed. Further, relationships among objects are formally expressed with RDF triples. Data replication, recovery, metadata export are supported with standard protocols, such as OAI-PMH. We provide a tentative comparative analysis of the chosen software stack with the Open Archival Information System (OAIS) reference model, along with our initial results with the existing terrestrial ecology data collections at NASA’s ORNL Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC).
Image processing tool for automatic feature recognition and quantification

DOEpatents

Chen, Xing; Stoddard, Ryan J.

2017-05-02

A system for defining structures within an image is described. The system includes reading of an input file, preprocessing the input file while preserving metadata such as scale information and then detecting features of the input file. In one version the detection first uses an edge detector followed by identification of features using a Hough transform. The output of the process is identified elements within the image.
Automatic publishing ISO 19115 metadata with PanMetaDocs using SensorML information

NASA Astrophysics Data System (ADS)

Stender, Vivien; Ulbricht, Damian; Schroeder, Matthias; Klump, Jens

2014-05-01

Terrestrial Environmental Observatories (TERENO) is an interdisciplinary and long-term research project spanning an Earth observation network across Germany. It includes four test sites within Germany from the North German lowlands to the Bavarian Alps and is operated by six research centers of the Helmholtz Association. The contribution by the participating research centers is organized as regional observatories. A challenge for TERENO and its observatories is to integrate all aspects of data management, data workflows, data modeling and visualizations into the design of a monitoring infrastructure. TERENO Northeast is one of the sub-observatories of TERENO and is operated by the German Research Centre for Geosciences (GFZ) in Potsdam. This observatory investigates geoecological processes in the northeastern lowland of Germany by collecting large amounts of environmentally relevant data. The success of long-term projects like TERENO depends on well-organized data management, data exchange between the partners involved and on the availability of the captured data. Data discovery and dissemination are facilitated not only through data portals of the regional TERENO observatories but also through a common spatial data infrastructure TEODOOR (TEreno Online Data repOsitORry). TEODOOR bundles the data, provided by the different web services of the single observatories, and provides tools for data discovery, visualization and data access. The TERENO Northeast data infrastructure integrates data from more than 200 instruments and makes data available through standard web services. Geographic sensor information and services are described using the ISO 19115 metadata schema. TEODOOR accesses the OGC Sensor Web Enablement (SWE) interfaces offered by the regional observatories. In addition to the SWE interface, TERENO Northeast also published data through DataCite. The necessary metadata are created in an automated process by extracting information from the SWE SensorML to create ISO 19115 compliant metadata. The resulting metadata file is stored in the GFZ Potsdam data infrastructure. The publishing workflow for file based research datasets at GFZ Potsdam is based on the eSciDoc infrastructure, using PanMetaDocs (PMD) as the graphical user interface. PMD is a collaborative, metadata based data and information exchange platform [1]. Besides SWE, metadata are also syndicated by PMD through an OAI-PMH interface. In addition, metadata from other observatories, projects or sensors in TERENO can be accessed through the TERENO Northeast data portal. [1] http://meetingorganizer.copernicus.org/EGU2012/EGU2012-7058-2.pdf
Standards-based curation of a decade-old digital repository dataset of molecular information.

PubMed

Harvey, Matthew J; Mason, Nicholas J; McLean, Andrew; Murray-Rust, Peter; Rzepa, Henry S; Stewart, James J P

2015-01-01

The desirable curation of 158,122 molecular geometries derived from the NCI set of reference molecules together with associated properties computed using the MOPAC semi-empirical quantum mechanical method and originally deposited in 2005 into the Cambridge DSpace repository as a data collection is reported. The procedures involved in the curation included annotation of the original data using new MOPAC methods, updating the syntax of the CML documents used to express the data to ensure schema conformance and adding new metadata describing the entries together with a XML schema transformation to map the metadata schema to that used by the DataCite organisation. We have adopted a granularity model in which a DataCite persistent identifier (DOI) is created for each individual molecule to enable data discovery and data metrics at this level using DataCite tools. We recommend that the future research data management (RDM) of the scientific and chemical data components associated with journal articles (the "supporting information") should be conducted in a manner that facilitates automatic periodic curation. Graphical abstractStandards and metadata-based curation of a decade-old digital repository dataset of molecular information.
In situ data analytics and indexing of protein trajectories.

PubMed

Johnston, Travis; Zhang, Boyu; Liwo, Adam; Crivelli, Silvia; Taufer, Michela

2017-06-15

The transition toward exascale computing will be accompanied by a performance dichotomy. Computational peak performance will rapidly increase; I/O performance will either grow slowly or be completely stagnant. Essentially, the rate at which data are generated will grow much faster than the rate at which data can be read from and written to the disk. MD simulations will soon face the I/O problem of efficiently writing to and reading from disk on the next generation of supercomputers. This article targets MD simulations at the exascale and proposes a novel technique for in situ data analysis and indexing of MD trajectories. Our technique maps individual trajectories' substructures (i.e., α-helices, β-strands) to metadata frame by frame. The metadata captures the conformational properties of the substructures. The ensemble of metadata can be used for automatic, strategic analysis within a trajectory or across trajectories, without manually identify those portions of trajectories in which critical changes take place. We demonstrate our technique's effectiveness by applying it to 26.3k helices and 31.2k strands from 9917 PDB proteins and by providing three empirical case studies. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Echo™ User Manual

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harvey, Dustin Yewell

Echo™ is a MATLAB-based software package designed for robust and scalable analysis of complex data workflows. An alternative to tedious, error-prone conventional processes, Echo is based on three transformative principles for data analysis: self-describing data, name-based indexing, and dynamic resource allocation. The software takes an object-oriented approach to data analysis, intimately connecting measurement data with associated metadata. Echo operations in an analysis workflow automatically track and merge metadata and computation parameters to provide a complete history of the process used to generate final results, while automated figure and report generation tools eliminate the potential to mislabel those results. History reportingmore » and visualization methods provide straightforward auditability of analysis processes. Furthermore, name-based indexing on metadata greatly improves code readability for analyst collaboration and reduces opportunities for errors to occur. Echo efficiently manages large data sets using a framework that seamlessly allocates resources such that only the necessary computations to produce a given result are executed. Echo provides a versatile and extensible framework, allowing advanced users to add their own tools and data classes tailored to their own specific needs. Applying these transformative principles and powerful features, Echo greatly improves analyst efficiency and quality of results in many application areas.« less
Font adaptive word indexing of modern printed documents.

PubMed

Marinai, Simone; Marino, Emanuele; Soda, Giovanni

2006-08-01

We propose an approach for the word-level indexing of modern printed documents which are difficult to recognize using current OCR engines. By means of word-level indexing, it is possible to retrieve the position of words in a document, enabling queries involving proximity of terms. Web search engines implement this kind of indexing, allowing users to retrieve Web pages on the basis of their textual content. Nowadays, digital libraries hold collections of digitized documents that can be retrieved either by browsing the document images or relying on appropriate metadata assembled by domain experts. Word indexing tools would therefore increase the access to these collections. The proposed system is designed to index homogeneous document collections by automatically adapting to different languages and font styles without relying on OCR engines for character recognition. The approach is based on three main ideas: the use of Self Organizing Maps (SOM) to perform unsupervised character clustering, the definition of one suitable vector-based word representation whose size depends on the word aspect-ratio, and the run-time alignment of the query word with indexed words to deal with broken and touching characters. The most appropriate applications are for processing modern printed documents (17th to 19th centuries) where current OCR engines are less accurate. Our experimental analysis addresses six data sets containing documents ranging from books of the 17th century to contemporary journals.
Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis

DOE Office of Scientific and Technical Information (OSTI.GOV)

He, Fei; Maslov, Sergei; Yoo, Shinjae

Here, transcriptome datasets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by the lack of metadata or differences in annotation styles by different labs. In this study, we carefully selected and integrated 6,057 Arabidopsis microarray expression samples from 304 experiments deposited to NCBI GEO. Metadata such as tissue type, growth condition, and developmental stage were manually curated for each sample. We then studied global expression landscape of the integrated dataset andmore » found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome compared to aerial tissues, but the transcriptome of cultured root is more similar to those of aerial tissues as the former samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating re-use of plant transcriptome data. As a proof of principle we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified accuracy of our predictions with samples’ metadata provided by authors.« less
Large-scale atlas of microarray data reveals the distinct expression landscape of different tissues in Arabidopsis

DOE PAGES

He, Fei; Maslov, Sergei; Yoo, Shinjae; ...

2016-05-25

Here, transcriptome datasets from thousands of samples of the model plant Arabidopsis thaliana have been collectively generated by multiple individual labs. Although integration and meta-analysis of these samples has become routine in the plant research community, it is often hampered by the lack of metadata or differences in annotation styles by different labs. In this study, we carefully selected and integrated 6,057 Arabidopsis microarray expression samples from 304 experiments deposited to NCBI GEO. Metadata such as tissue type, growth condition, and developmental stage were manually curated for each sample. We then studied global expression landscape of the integrated dataset andmore » found that samples of the same tissue tend to be more similar to each other than to samples of other tissues, even in different growth conditions or developmental stages. Root has the most distinct transcriptome compared to aerial tissues, but the transcriptome of cultured root is more similar to those of aerial tissues as the former samples lost their cellular identity. Using a simple computational classification method, we showed that the tissue type of a sample can be successfully predicted based on its expression profile, opening the door for automatic metadata extraction and facilitating re-use of plant transcriptome data. As a proof of principle we applied our automated annotation pipeline to 708 RNA-seq samples from public repositories and verified accuracy of our predictions with samples’ metadata provided by authors.« less
A Framework for Collaborative Review of Candidate Events in High Data Rate Streams: the V-Fastr Experiment as a Case Study

NASA Astrophysics Data System (ADS)

Hart, Andrew F.; Cinquini, Luca; Khudikyan, Shakeh E.; Thompson, David R.; Mattmann, Chris A.; Wagstaff, Kiri; Lazio, Joseph; Jones, Dayton

2015-01-01

“Fast radio transients” are defined here as bright millisecond pulses of radio-frequency energy. These short-duration pulses can be produced by known objects such as pulsars or potentially by more exotic objects such as evaporating black holes. The identification and verification of such an event would be of great scientific value. This is one major goal of the Very Long Baseline Array (VLBA) Fast Transient Experiment (V-FASTR), a software-based detection system installed at the VLBA. V-FASTR uses a “commensal” (piggy-back) approach, analyzing all array data continually during routine VLBA observations and identifying candidate fast transient events. Raw data can be stored from a buffer memory, which enables a comprehensive off-line analysis. This is invaluable for validating the astrophysical origin of any detection. Candidates discovered by the automatic system must be reviewed each day by analysts to identify any promising signals that warrant a more in-depth investigation. To support the timely analysis of fast transient detection candidates by V-FASTR scientists, we have developed a metadata-driven, collaborative candidate review framework. The framework consists of a software pipeline for metadata processing composed of both open source software components and project-specific code written expressly to extract and catalog metadata from the incoming V-FASTR data products, and a web-based data portal that facilitates browsing and inspection of the available metadata for candidate events extracted from the VLBA radio data.

Effective use of metadata in the integration and analysis of multi-dimensional optical data

NASA Astrophysics Data System (ADS)

Pastorello, G. Z.; Gamon, J. A.

2012-12-01

Data discovery and integration relies on adequate metadata. However, creating and maintaining metadata is time consuming and often poorly addressed or avoided altogether, leading to problems in later data analysis and exchange. This is particularly true for research fields in which metadata standards do not yet exist or are under development, or within smaller research groups without enough resources. Vegetation monitoring using in-situ and remote optical sensing is an example of such a domain. In this area, data are inherently multi-dimensional, with spatial, temporal and spectral dimensions usually being well characterized. Other equally important aspects, however, might be inadequately translated into metadata. Examples include equipment specifications and calibrations, field/lab notes and field/lab protocols (e.g., sampling regimen, spectral calibration, atmospheric correction, sensor view angle, illumination angle), data processing choices (e.g., methods for gap filling, filtering and aggregation of data), quality assurance, and documentation of data sources, ownership and licensing. Each of these aspects can be important as metadata for search and discovery, but they can also be used as key data fields in their own right. If each of these aspects is also understood as an "extra dimension," it is possible to take advantage of them to simplify the data acquisition, integration, analysis, visualization and exchange cycle. Simple examples include selecting data sets of interest early in the integration process (e.g., only data collected according to a specific field sampling protocol) or applying appropriate data processing operations to different parts of a data set (e.g., adaptive processing for data collected under different sky conditions). More interesting scenarios involve guided navigation and visualization of data sets based on these extra dimensions, as well as partitioning data sets to highlight relevant subsets to be made available for exchange. The DAX (Data Acquisition to eXchange) Web-based tool uses a flexible metadata representation model and takes advantage of multi-dimensional data structures to translate metadata types into data dimensions, effectively reshaping data sets according to available metadata. With that, metadata is tightly integrated into the acquisition-to-exchange cycle, allowing for more focused exploration of data sets while also increasing the value of, and incentives for, keeping good metadata. The tool is being developed and tested with optical data collected in different settings, including laboratory, field, airborne, and satellite platforms.
Evolution of the architecture of the ATLAS Metadata Interface (AMI)

NASA Astrophysics Data System (ADS)

Odier, J.; Aidel, O.; Albrand, S.; Fulachier, J.; Lambert, F.

2015-12-01

The ATLAS Metadata Interface (AMI) is now a mature application. Over the years, the number of users and the number of provided functions has dramatically increased. It is necessary to adapt the hardware infrastructure in a seamless way so that the quality of service re - mains high. We describe the AMI evolution since its beginning being served by a single MySQL backend database server to the current state having a cluster of virtual machines at French Tier1, an Oracle database at Lyon with complementary replication to the Oracle DB at CERN and AMI back-up server.
Roogle: an information retrieval engine for clinical data warehouse.

PubMed

Cuggia, Marc; Garcelon, Nicolas; Campillo-Gimenez, Boris; Bernicot, Thomas; Laurent, Jean-François; Garin, Etienne; Happe, André; Duvauferrier, Régis

2011-01-01

High amount of relevant information is contained in reports stored in the electronic patient records and associated metadata. R-oogle is a project aiming at developing information retrieval engines adapted to these reports and designed for clinicians. The system consists in a data warehouse (full-text reports and structured data) imported from two different hospital information systems. Information retrieval is performed using metadata-based semantic and full-text search methods (as Google). Applications may be biomarkers identification in a translational approach, search of specific cases, and constitution of cohorts, professional practice evaluation, and quality control assessment.
Software for minimalistic data management in large camera trap studies

PubMed Central

Krishnappa, Yathin S.; Turner, Wendy C.

2014-01-01

The use of camera traps is now widespread and their importance in wildlife studies well understood. Camera trap studies can produce millions of photographs and there is a need for software to help manage photographs efficiently. In this paper, we describe a software system that was built to successfully manage a large behavioral camera trap study that produced more than a million photographs. We describe the software architecture and the design decisions that shaped the evolution of the program over the study’s three year period. The software system has the ability to automatically extract metadata from images, and add customized metadata to the images in a standardized format. The software system can be installed as a standalone application on popular operating systems. It is minimalistic, scalable and extendable so that it can be used by small teams or individual researchers for a broad variety of camera trap studies. PMID:25110471
Video personalization for usage environment

NASA Astrophysics Data System (ADS)

Tseng, Belle L.; Lin, Ching-Yung; Smith, John R.

2002-07-01

A video personalization and summarization system is designed and implemented incorporating usage environment to dynamically generate a personalized video summary. The personalization system adopts the three-tier server-middleware-client architecture in order to select, adapt, and deliver rich media content to the user. The server stores the content sources along with their corresponding MPEG-7 metadata descriptions. Our semantic metadata is provided through the use of the VideoAnnEx MPEG-7 Video Annotation Tool. When the user initiates a request for content, the client communicates the MPEG-21 usage environment description along with the user query to the middleware. The middleware is powered by the personalization engine and the content adaptation engine. Our personalization engine includes the VideoSue Summarization on Usage Environment engine that selects the optimal set of desired contents according to user preferences. Afterwards, the adaptation engine performs the required transformations and compositions of the selected contents for the specific usage environment using our VideoEd Editing and Composition Tool. Finally, two personalization and summarization systems are demonstrated for the IBM Websphere Portal Server and for the pervasive PDA devices.
OAI and NASA's Scientific and Technical Information

NASA Technical Reports Server (NTRS)

Nelson, Michael L.; Rocker, JoAnne; Harrison, Terry L.

2002-01-01

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an evolving protocol and philosophy regarding interoperability for digital libraries (DLs). Previously, "distributed searching" models were popular for DL interoperability. However, experience has shown distributed searching systems across large numbers of DLs to be difficult to maintain in an Internet environment. The OAI-PMH is a move away from distributed searching, focusing on the arguably simpler model of "metadata harvesting". We detail NASA s involvement in defining and testing the OAI-PMH and experience to date with adapting existing NASA distributed searching DLs (such as the NASA Technical Report Server) to use the OAI-PMH and metadata harvesting. We discuss some of the entirely new DL projects that the OAI-PMH has made possible, such as the Technical Report Interchange project. We explain the strategic importance of the OAI-PMH to the mission of NASA s Scientific and Technical Information Program.
Software Implements a Space-Mission File-Transfer Protocol

NASA Technical Reports Server (NTRS)

Rundstrom, Kathleen; Ho, Son Q.; Levesque, Michael; Sanders, Felicia; Burleigh, Scott; Veregge, John

2004-01-01

CFDP is a computer program that implements the CCSDS (Consultative Committee for Space Data Systems) File Delivery Protocol, which is an international standard for automatic, reliable transfers of files of data between locations on Earth and in outer space. CFDP administers concurrent file transfers in both directions, delivery of data out of transmission order, reliable and unreliable transmission modes, and automatic retransmission of lost or corrupted data by use of one or more of several lost-segment-detection modes. The program also implements several data-integrity measures, including file checksums and optional cyclic redundancy checks for each protocol data unit. The metadata accompanying each file can include messages to users application programs and commands for operating on remote file systems.
Automated metadata, provenance cataloging and navigable interfaces: ensuring the usefulness of extreme-scale data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schissel, David; Greenwald, Martin

The MPO (Metadata, Provenance, Ontology) Project successfully addressed the goal of improving the usefulness and traceability of scientific data by building a system that could capture and display all steps in the process of creating, analyzing and disseminating that data. Throughout history, scientists have generated handwritten logbooks to keep track of data, their hypotheses, assumptions, experimental setup, and computational processes as well as reflections on observations and issues encountered. Over the last several decades, with the growth of personal computers, handheld devices, and the World Wide Web, the handwritten logbook has begun to be replaced by electronic logbooks. This transitionmore » has brought increased capability such as supporting multi-media, hypertext, and fast searching. However, content creation and metadata (a set of data that describes and gives information about other data) capturing has for the most part remained a manual activity just as it was with handwritten logbooks. This has led to a fragmentation of data, processing, and annotation that has only accelerated as scientific workflows continue to increase in complexity. From a scientific perspective, it is very important to be able to understand the lineage of any piece of data: who, what, when, how, and why. This is typically referred to as data provenance. The fragmentation discussed previously often means that data provenance is lost. As scientific workflows move to powerful computers and become more complex, the ability to track all of the steps involved in creating a piece of data become even more difficult. It was the goal of the MPO (Metadata, Provenance, Ontology) Project to create a system (the MPO System) that allows for automatic provenance and metadata capturing in such a way to allow easy searching and browsing. This goal needed to be accomplished in a general way so that it may be used across a broad range of scientific domains, yet allow the addition of vocabulary (Ontology) that is domain specific as is required for intelligent searching and browsing in the scientific context. Through the creation and deployment of the MPO system, the goals of the project were achieved. An enhanced metadata, provenance, and ontology storage system was created. This was combined with innovative methodologies for navigating and exploring these data using a web browser for both experimental and simulation-based scientific research. In addition, a system to allow scientists to instrument their existing workflows for automatic metadata and provenance is part of the MPO system. In that way, a scientist can continue to use their existing methodology yet easily document their work. Workflows and data provenance can be displayed either graphically or in an electronic notebook format and support advanced search features including via ontology. The MPO system was successfully used in both Climate and Magnetic Fusion Energy Research. The software for the MPO system is located at https://github.com/MPO-Group/MPO and is open source distributed under the Revised BSD License. A demonstration site of the MPO system is open to the public and is available at https://mpo.psfc.mit.edu/. A Docker container release of the command line client is available for public download using the command docker pull jcwright/mpo-cli at https://hub.docker.com/r/jcwright/mpo-cli.« less
Scalable Data Mining and Archiving for the Square Kilometre Array

NASA Astrophysics Data System (ADS)

Jones, D. L.; Mattmann, C. A.; Hart, A. F.; Lazio, J.; Bennett, T.; Wagstaff, K. L.; Thompson, D. R.; Preston, R.

2011-12-01

As the technologies for remote observation improve, the rapid increase in the frequency and fidelity of those observations translates into an avalanche of data that is already beginning to eclipse the resources, both human and technical, of the institutions and facilities charged with managing the information. Common data management tasks like cataloging both data itself and contextual meta-data, creating and maintaining scalable permanent archive, and making data available on-demand for research present significant software engineering challenges when considered at the scales of modern multi-national scientific enterprises such as the upcoming Square Kilometre Array project. The NASA Jet Propulsion Laboratory (JPL), leveraging internal research and technology development funding, has begun to explore ways to address the data archiving and distribution challenges with a number of parallel activities involving collaborations with the EVLA and ALMA teams at the National Radio Astronomy Observatory (NRAO), and members of the Square Kilometre Array South Africa team. To date, we have leveraged the Apache OODT Process Control System framework and its catalog and archive service components that provide file management, workflow management, resource management as core web services. A client crawler framework ingests upstream data (e.g., EVLA raw directory output), identifies its MIME type and automatically extracts relevant metadata including temporal bounds, and job-relevant/processing information. A remote content acquisition (pushpull) service is responsible for staging remote content and handing it off to the crawler framework. A science algorithm wrapper (called CAS-PGE) wraps underlying code including CASApy programs for the EVLA, such as Continuum Imaging and Spectral Line Cube generation, executes the algorithm, and ingests its output (along with relevant extracted metadata). In addition to processing, the Process Control System has been leveraged to provide data curation and automatic ingestion for the MeerKAT/KAT-7 precursor instrument in South Africa, helping to catalog and archive correlator and sensor output from KAT-7, and to make the information available for downstream science analysis. These efforts, supported by the increasing availability of high-quality open source software, represent a concerted effort to seek a cost-conscious methodology for maintaining the integrity of observational data from the upstream instrument to the archive, and at the same time ensuring that the data, with its richly annotated catalog of meta-data, remains a viable resource for research into the future.
Discovering Physical Samples Through Identifiers, Metadata, and Brokering

NASA Astrophysics Data System (ADS)

Arctur, D. K.; Hills, D. J.; Jenkyns, R.

2015-12-01

Physical samples, particularly in the geosciences, are key to understanding the Earth system, its history, and its evolution. Our record of the Earth as captured by physical samples is difficult to explain and mine for understanding, due to incomplete, disconnected, and evolving metadata content. This is further complicated by differing ways of classifying, cataloguing, publishing, and searching the metadata, especially when specimens do not fit neatly into a single domain—for example, fossils cross disciplinary boundaries (mineral and biological). Sometimes even the fundamental classification systems evolve, such as the geological time scale, triggering daunting processes to update existing specimen databases. Increasingly, we need to consider ways of leveraging permanent, unique identifiers, as well as advancements in metadata publishing that link digital records with physical samples in a robust, adaptive way. An NSF EarthCube Research Coordination Network (RCN) called the Internet of Samples (iSamples) is now working to bridge the metadata schemas for biological and geological domains. We are leveraging the International Geo Sample Number (IGSN) that provides a versatile system of registering physical samples, and working to harmonize this with the DataCite schema for Digital Object Identifiers (DOI). A brokering approach for linking disparate catalogues and classification systems could help scale discovery and access to the many large collections now being managed (sometimes millions of specimens per collection). This presentation is about our community building efforts, research directions, and insights to date.
Designing Extensible Data Management for Ocean Observatories, Platforms, and Devices

NASA Astrophysics Data System (ADS)

Graybeal, J.; Gomes, K.; McCann, M.; Schlining, B.; Schramm, R.; Wilkin, D.

2002-12-01

The Monterey Bay Aquarium Research Institute (MBARI) has been collecting science data for 15 years from all kinds of oceanographic instruments and systems, and is building a next-generation observing system, the MBARI Ocean Observing System (MOOS). To meet the data management requirements of the MOOS, the Institute began developing a flexible, extensible data management solution, the Shore Side Data System (SSDS). This data management system must address a wide variety of oceanographic instruments and data sources, including instruments and platforms of the future. Our data management solution will address all elements of the data management challenge, from ingest (including suitable pre-definition of metadata) through to access and visualization. Key to its success will be ease of use, and automatic incorporation of new data streams and data sets. The data will be of many different forms, and come from many different types of instruments. Instruments will be designed for fixed locations (as with moorings), changing locations (drifters and AUVs), and cruise-based sampling. Data from airplanes, satellites, models, and external archives must also be considered. Providing an architecture which allows data from these varied sources to be automatically archived and processed, yet readily accessed, is only possible with the best practices in metadata definition, software design, and re-use of third-party components. The current status of SSDS development will be presented, including lessons learned from our science users and from previous data management designs.
Facilitating Stewardship of scientific data through standards based workflows

NASA Astrophysics Data System (ADS)

Bastrakova, I.; Kemp, C.; Potter, A. K.

2013-12-01

There are main suites of standards that can be used to define the fundamental scientific methodology of data, methods and results. These are firstly Metadata standards to enable discovery of the data (ISO 19115), secondly the Sensor Web Enablement (SWE) suite of standards that include the O&M and SensorML standards and thirdly Ontology that provide vocabularies to define the scientific concepts and relationships between these concepts. All three types of standards have to be utilised by the practicing scientist to ensure that those who ultimately have to steward the data stewards to ensure that the data can be preserved curated and reused and repurposed. Additional benefits of this approach include transparency of scientific processes from the data acquisition to creation of scientific concepts and models, and provision of context to inform data use. Collecting and recording metadata is the first step in scientific data flow. The primary role of metadata is to provide details of geographic extent, availability and high-level description of data suitable for its initial discovery through common search engines. The SWE suite provides standardised patterns to describe observations and measurements taken for these data, capture detailed information about observation or analytical methods, used instruments and define quality determinations. This information standardises browsing capability over discrete data types. The standardised patterns of the SWE standards simplify aggregation of observation and measurement data enabling scientists to transfer disintegrated data to scientific concepts. The first two steps provide a necessary basis for the reasoning about concepts of ';pure' science, building relationship between concepts of different domains (linked-data), and identifying domain classification and vocabularies. Geoscience Australia is re-examining its marine data flows, including metadata requirements and business processes, to achieve a clearer link between scientific data acquisition and analysis requirements and effective interoperable data management and delivery. This includes participating in national and international dialogue on development of standards, embedding data management activities in business processes, and developing scientific staff as effective data stewards. Similar approach is applied to the geophysical data. By ensuring the geophysical datasets at GA strictly follow metadata and industry standards we are able to implement a provenance based workflow where the data is easily discoverable, geophysical processing can be applied to it and results can be stored. The provenance based workflow enables metadata records for the results to be produced automatically from the input dataset metadata.
Managing Data, Provenance and Chaos through Standardization and Automation at the Georgia Coastal Ecosystems LTER Site

NASA Astrophysics Data System (ADS)

Sheldon, W.

2013-12-01

Managing data for a large, multidisciplinary research program such as a Long Term Ecological Research (LTER) site is a significant challenge, but also presents unique opportunities for data stewardship. LTER research is conducted within multiple organizational frameworks (i.e. a specific LTER site as well as the broader LTER network), and addresses both specific goals defined in an NSF proposal as well as broader goals of the network; therefore, every LTER data can be linked to rich contextual information to guide interpretation and comparison. The challenge is how to link the data to this wealth of contextual metadata. At the Georgia Coastal Ecosystems LTER we developed an integrated information management system (GCE-IMS) to manage, archive and distribute data, metadata and other research products as well as manage project logistics, administration and governance (figure 1). This system allows us to store all project information in one place, and provide dynamic links through web applications and services to ensure content is always up to date on the web as well as in data set metadata. The database model supports tracking changes over time in personnel roles, projects and governance decisions, allowing these databases to serve as canonical sources of project history. Storing project information in a central database has also allowed us to standardize both the formatting and content of critical project information, including personnel names, roles, keywords, place names, attribute names, units, and instrumentation, providing consistency and improving data and metadata comparability. Lookup services for these standard terms also simplify data entry in web and database interfaces. We have also coupled the GCE-IMS to our MATLAB- and Python-based data processing tools (i.e. through database connections) to automate metadata generation and packaging of tabular and GIS data products for distribution. Data processing history is automatically tracked throughout the data lifecycle, from initial import through quality control, revision and integration by our data processing system (GCE Data Toolbox for MATLAB), and included in metadata for versioned data products. This high level of automation and system integration has proven very effective in managing the chaos and scalability of our information management program.
A framework for collaborative review of candidate events in high data rate streams: The V-FASTR experiment as a case study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hart, Andrew F.; Cinquini, Luca; Khudikyan, Shakeh E.

2015-01-01

“Fast radio transients” are defined here as bright millisecond pulses of radio-frequency energy. These short-duration pulses can be produced by known objects such as pulsars or potentially by more exotic objects such as evaporating black holes. The identification and verification of such an event would be of great scientific value. This is one major goal of the Very Long Baseline Array (VLBA) Fast Transient Experiment (V-FASTR), a software-based detection system installed at the VLBA. V-FASTR uses a “commensal” (piggy-back) approach, analyzing all array data continually during routine VLBA observations and identifying candidate fast transient events. Raw data can be storedmore » from a buffer memory, which enables a comprehensive off-line analysis. This is invaluable for validating the astrophysical origin of any detection. Candidates discovered by the automatic system must be reviewed each day by analysts to identify any promising signals that warrant a more in-depth investigation. To support the timely analysis of fast transient detection candidates by V-FASTR scientists, we have developed a metadata-driven, collaborative candidate review framework. The framework consists of a software pipeline for metadata processing composed of both open source software components and project-specific code written expressly to extract and catalog metadata from the incoming V-FASTR data products, and a web-based data portal that facilitates browsing and inspection of the available metadata for candidate events extracted from the VLBA radio data.« less
Integration of external metadata into the Earth System Grid Federation (ESGF)

NASA Astrophysics Data System (ADS)

Berger, Katharina; Levavasseur, Guillaume; Stockhause, Martina; Lautenschlager, Michael

2015-04-01

International projects with high volume data usually disseminate their data in a federated data infrastructure, e.g.~the Earth System Grid Federation (ESGF). The ESGF aims to make the geographically distributed data seamlessly discoverable and accessible. Additional data-related information is currently collected and stored in separate repositories by each data provider. This scattered and useful information is not or only partly available for ESGF users. Examples for such additional information systems are ES-DOC/metafor for model and simulation information, IPSL's versioning information, CHARMe for user annotations, DKRZ's quality information and data citation information. The ESGF Quality Control working team (esgf-qcwt) aims to integrate these valuable pieces of additional information into the ESGF in order to make them available to users and data archive managers by (i) integrating external information into ESGF portal, (ii) integrating links to external information objects into the ESGF metadata index, e.g. by the use of PIDs (Persistent IDentifiers), and (iii) automating the collection of external information during the ESGF data publication process. For the sixth phase of CMIP (Coupled Model Intercomparison Project), the ESGF metadata index is to be enriched by additional information on data citation, file version, etc. This information will support users directly and can be automatically exploited by higher level services (human and machine readability).
Simple, Script-Based Science Processing Archive

NASA Technical Reports Server (NTRS)

Lynnes, Christopher; Hegde, Mahabaleshwara; Barth, C. Wrandle

2007-01-01

The Simple, Scalable, Script-based Science Processing (S4P) Archive (S4PA) is a disk-based archival system for remote sensing data. It is based on the data-driven framework of S4P and is used for data transfer, data preprocessing, metadata generation, data archive, and data distribution. New data are automatically detected by the system. S4P provides services such as data access control, data subscription, metadata publication, data replication, and data recovery. It comprises scripts that control the data flow. The system detects the availability of data on an FTP (file transfer protocol) server, initiates data transfer, preprocesses data if necessary, and archives it on readily available disk drives with FTP and HTTP (Hypertext Transfer Protocol) access, allowing instantaneous data access. There are options for plug-ins for data preprocessing before storage. Publication of metadata to external applications such as the Earth Observing System Clearinghouse (ECHO) is also supported. S4PA includes a graphical user interface for monitoring the system operation and a tool for deploying the system. To ensure reliability, S4P continuously checks stored data for integrity, Further reliability is provided by tape backups of disks made once a disk partition is full and closed. The system is designed for low maintenance, requiring minimal operator oversight.
Java-Library for the Access, Storage and Editing of Calibration Metadata of Optical Sensors

NASA Astrophysics Data System (ADS)

Firlej, M.; Kresse, W.

2016-06-01

The standardization of the calibration of optical sensors in photogrammetry and remote sensing has been discussed for more than a decade. Projects of the German DGPF and the European EuroSDR led to the abstract International Technical Specification ISO/TS 19159-1:2014 "Calibration and validation of remote sensing imagery sensors and data - Part 1: Optical sensors". This article presents the first software interface for a read- and write-access to all metadata elements standardized in the ISO/TS 19159-1. This interface is based on an xml-schema that was automatically derived by ShapeChange from the UML-model of the Specification. The software interface serves two cases. First, the more than 300 standardized metadata elements are stored individually according to the xml-schema. Secondly, the camera manufacturers are using many administrative data that are not a part of the ISO/TS 19159-1. The new software interface provides a mechanism for input, storage, editing, and output of both types of data. Finally, an output channel towards a usual calibration protocol is provided. The interface is written in Java. The article also addresses observations made when analysing the ISO/TS 19159-1 and compiles a list of proposals for maturing the document, i.e. for an updated version of the Specification.
FIR: An Effective Scheme for Extracting Useful Metadata from Social Media.

PubMed

Chen, Long-Sheng; Lin, Zue-Cheng; Chang, Jing-Rong

2015-11-01

Recently, the use of social media for health information exchange is expanding among patients, physicians, and other health care professionals. In medical areas, social media allows non-experts to access, interpret, and generate medical information for their own care and the care of others. Researchers paid much attention on social media in medical educations, patient-pharmacist communications, adverse drug reactions detection, impacts of social media on medicine and healthcare, and so on. However, relatively few papers discuss how to extract useful knowledge from a huge amount of textual comments in social media effectively. Therefore, this study aims to propose a Fuzzy adaptive resonance theory network based Information Retrieval (FIR) scheme by combining Fuzzy adaptive resonance theory (ART) network, Latent Semantic Indexing (LSI), and association rules (AR) discovery to extract knowledge from social media. In our FIR scheme, Fuzzy ART network firstly has been employed to segment comments. Next, for each customer segment, we use LSI technique to retrieve important keywords. Then, in order to make the extracted keywords understandable, association rules mining is presented to organize these extracted keywords to build metadata. These extracted useful voices of customers will be transformed into design needs by using Quality Function Deployment (QFD) for further decision making. Unlike conventional information retrieval techniques which acquire too many keywords to get key points, our FIR scheme can extract understandable metadata from social media.
Scientific Platform as a Service - Tools and solutions for efficient access to and analysis of oceanographic data

NASA Astrophysics Data System (ADS)

Vines, Aleksander; Hansen, Morten W.; Korosov, Anton

2017-04-01

Existing infrastructure international and Norwegian projects, e.g., NorDataNet, NMDC and NORMAP, provide open data access through the OPeNDAP protocol following the conventions for CF (Climate and Forecast) metadata, designed to promote the processing and sharing of files created with the NetCDF application programming interface (API). This approach is now also being implemented in the Norwegian Sentinel Data Hub (satellittdata.no) to provide satellite EO data to the user community. Simultaneously with providing simplified and unified data access, these projects also seek to use and establish common standards for use and discovery metadata. This then allows development of standardized tools for data search and (subset) streaming over the internet to perform actual scientific analysis. A combinnation of software tools, which we call a Scientific Platform as a Service (SPaaS), will take advantage of these opportunities to harmonize and streamline the search, retrieval and analysis of integrated satellite and auxiliary observations of the oceans in a seamless system. The SPaaS is a cloud solution for integration of analysis tools with scientific datasets via an API. The core part of the SPaaS is a distributed metadata catalog to store granular metadata describing the structure, location and content of available satellite, model, and in situ datasets. The analysis tools include software for visualization (also online), interactive in-depth analysis, and server-based processing chains. The API conveys search requests between system nodes (i.e., interactive and server tools) and provides easy access to the metadata catalog, data repositories, and the tools. The SPaaS components are integrated in virtual machines, of which provisioning and deployment are automatized using existing state-of-the-art open-source tools (e.g., Vagrant, Ansible, Docker). The open-source code for scientific tools and virtual machine configurations is under version control at https://github.com/nansencenter/, and is coupled to an online continuous integration system (e.g., Travis CI).
The Digital Sample: Metadata, Unique Identification, and Links to Data and Publications

NASA Astrophysics Data System (ADS)

Lehnert, K. A.; Vinayagamoorthy, S.; Djapic, B.; Klump, J.

2006-12-01

A significant part of digital data in the Geosciences refers to physical samples of Earth materials, from igneous rocks to sediment cores to water or gas samples. The application and long-term utility of these sample-based data in research is critically dependent on (a) the availability of information (metadata) about the samples such as geographical location and time of sampling, or sampling method, (b) links between the different data types available for individual samples that are dispersed in the literature and in digital data repositories, and (c) access to the samples themselves. Major problems for achieving this include incomplete documentation of samples in publications, use of ambiguous sample names, and the lack of a central catalog that allows to find a sample's archiving location. The International Geo Sample Number IGSN, managed by the System for Earth Sample Registration SESAR, provides solutions for these problems. The IGSN is a unique persistent identifier for samples and other GeoObjects that can be obtained by submitting sample metadata to SESAR (www.geosamples.org). If data in a publication is referenced to an IGSN (rather than an ambiguous sample name), sample metadata can readily be extracted from the SESAR database, which evolves into a Global Sample Catalog that also allows to locate the owner or curator of the sample. Use of the IGSN in digital data systems allows building linkages between distributed data. SESAR is contributing to the development of sample metadata standards. SESAR will integrate the IGSN in persistent, resolvable identifiers based on the handle.net service to advance direct linkages between the digital representation of samples in SESAR (sample profiles) and their related data in the literature and in web-accessible digital data repositories. Technologies outlined by Klump et al. (this session) such as the automatic creation of ontologies by text mining applications will be explored for harvesting identifiers of publications and datasets that contain information about a specific sample in order to establish comprehensive data profiles for samples.

The DKIST Data Center: Meeting the Data Challenges for Next-Generation, Ground-Based Solar Physics

NASA Astrophysics Data System (ADS)

Davey, A. R.; Reardon, K.; Berukoff, S. J.; Hays, T.; Spiess, D.; Watson, F. T.; Wiant, S.

2016-12-01

The Daniel K. Inouye Solar Telescope (DKIST) is under construction on the summit of Haleakalā in Maui, and scheduled to start science operations in 2020. The DKIST design includes a four-meter primary mirror coupled to an adaptive optics system, and a flexible instrumentation suite capable of delivering high-resolution optical and infrared observations of the solar chromosphere, photosphere, and corona. Through investigator-driven science proposals, the facility will generate an average of 8 TB of data daily, comprised of millions of images and hundreds of millions of metadata elements. The DKIST Data Center is responsible for the long-term curation and calibration of data received from the DKIST, and for distributing it to the user community for scientific use. Two key elements necessary to meet the inherent big data challenge are the development of flexible public/private cloud computing and coupled relational and non-relational data storage mechanisms. We discuss how this infrastructure is being designed to meet the significant expectation of automatic and manual calibration of ground-based solar physics data, and the maximization the data's utility through efficient, long-term data management practices implemented with prudent process definition and technology exploitation.
The AtChem On-line model and Electronic Laboratory Notebook (ELN): A free community modelling tool with provenance capture

NASA Astrophysics Data System (ADS)

Young, J. C.; Boronska, K.; Martin, C. J.; Rickard, A. R.; Vázquez Moreno, M.; Pilling, M. J.; Haji, M. H.; Dew, P. M.; Lau, L. M.; Jimack, P. K.

2010-12-01

AtChem On-line1 is a simple to use zero-dimensional box modelling toolkit, developed for use by laboratory, field and chamber scientists. Any set of chemical reactions can be simulated, in particular the whole Master Chemical Mechanism (MCM2) or any subset of it. Parameters and initial data can be provided through a self-explanatory web form and the resulting model is compiled and run on a dedicated server. The core part of the toolkit, providing a robust solver for thousands of chemical reactions, is written in Fortran and uses SUNDIALS3 CVODE libraries. Chemical systems can be constrained at multiple, user-determined timescales; this enabled studies of radical chemistry at one minute timescales. AtChem On-line is free to use and requires no installation - a web browser, text editor and any compressing software is all the user needs. CPU and storage are provided by the server (input and output data are saved indefinitely). An off-line version is also being developed, which will provide batch processing, an advanced graphical user interface and post-processing tools, for example, Rate of Production Analysis (ROPA) and chainlength analysis. The source code is freely available for advanced users wishing to adapt and run the program locally. Data management, dissemination and archiving are essential in all areas of science. In order to do this in an efficient and transparent way, there is a critical need to capture high quality metadata/provenance for modelling activities. An Electronic Laboratory Notebook (ELN) has been developed in parallel with AtChem Online as part of the EC EUROCHAMP24 project. In order to use controlled chamber experiments to evaluate the MCM, we need to be able to archive, track and search information on all associated chamber model runs, so that they can be used in subsequent mechanism development. Therefore it would be extremely useful if experiment and model metadata/provenance could be easily and automatically stored electronically. Archiving metadata/provenance via an ELN makes it easier to write a paper or thesis and for mechanism developers/evaluators/peer review to search for appropriate experimental and modelling results and conclusions. The development of an ELN in the context mechanism evaluation/development using large experimental chamber datasets is presented.
Expanding Access to NCAR's Digital Assets: Towards a Unified Scientific Data Management System

NASA Astrophysics Data System (ADS)

Stott, D.

2016-12-01

In 2014 the National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement the strategic vision of an integrated front door for data discovery and access across the organization, including all laboratories, the library, and UCAR Community Programs. The DSET is focused on improving the quality of users' experiences in finding and using NCAR's digital assets. This effort also supports new policies included in federal mandates, NSF requirements, and journal publication rules. An initial survey with 97 respondents identified 68 persons responsible for more than 3 petabytes of data. An inventory, using the Data Asset Framework produced by the UK Digital Curation Centre as a starting point, identified asset types that included files and metadata, publications, images, and software (visualization, analysis, model codes). User story sessions with representatives from each lab identified and ranked desired features for a unified Scientific Data Management System (SDMS). A process beginning with an organization-wide assessment of metadata by the HDF Group and followed by meetings with labs to identify key documentation concepts, culminated in the development of an NCAR metadata dialect that leverages the DataCite and ISO 19115 standards. The tasks ahead are to build out an SDMS and populate it with rich standardized metadata. Software packages have been prototyped and currently are being tested and reviewed by DSET members. Key challenges for the DSET include technical and non-technical issues. First, the status quo with regard to how assets are managed varies widely across the organization. There are differences in file format standards, technologies, and discipline-specific vocabularies. Metadata diversity is another real challenge. The types of metadata, the standards used, and the capacity to create new metadata varies across the organization. Significant effort is required to develop tools to create new standard metadata across the organization, adapt and integrate current digital assets, and establish consistent data management practices going forward. To be successful, best practices must be infused into daily activities. This poster will highlight the processes, lessons learned, and current status of the DSET effort at NCAR.
Documenting Climate Models and Their Simulations

DOE PAGES

Guilyardi, Eric; Balaji, V.; Lawrence, Bryan; ...

2013-05-01

The results of climate models are of increasing and widespread importance. No longer is climate model output of sole interest to climate scientists and researchers in the climate change impacts and adaptation fields. Now nonspecialists such as government officials, policy makers, and the general public all have an increasing need to access climate model output and understand its implications. For this host of users, accurate and complete metadata (i.e., information about how and why the data were produced) is required to document the climate modeling results. We describe a pilot community initiative to collect and make available documentation of climatemore » models and their simulations. In an initial application, a metadata repository is being established to provide information of this kind for a major internationally coordinated modeling activity known as CMIP5 (Coupled Model Intercomparison Project, Phase 5). We expected that for a wide range of stakeholders, this and similar community-managed metadata repositories will spur development of analysis tools that facilitate discovery and exploitation of Earth system simulations.« less
Validation of crowdsourced automatic rain gauge measurements in Amsterdam

NASA Astrophysics Data System (ADS)

de Vos, Lotte; Leijnse, Hidde; Overeem, Aart; Uijlenhoet, Remko

2016-04-01

The increasing number of privately owned weather stations and the facilitating role the internet to make this data publicly available, has led to several online platforms that collect and visualize crowdsourced weather data. This has resulted in ever increasing freely available datasets of weather measurements generated by amateur weather enthusiasts. Because of the lack of quality control and the frequent absence of metadata, these measurements are often considered as unreliable. Given the often large variability of weather variables in space and time, and the generally low number of official weather stations, this growing quantity of crowdsourced data may become an important additional source of information. Amateur weather observations have become more frequent over the past decade due to weather stations becoming more user-friendly and affordable. The variables measured by these weather stations are temperature, pressure and dew point, and in some cases wind and rainfall. Meteorological data from crowdsourced automatic weather stations in cities have primarily been used to examine the urban heat island effect. Thus far, these studies have focused on the comparison of the crowdsourced station temperature measurements with a nearby WMO-standard weather station, which is often located in a rural area or the outskirts of a city, generally not being representative of the city center. Instead of temperature, the rainfall measurements by the stations are examined. This research focuses on the combined ability of a large number of privately owned weather stations in an urban setting to correctly monitor rainfall. A set of 64 automatic weather stations distributed over Amsterdam (The Netherlands) that have at least 3 months of precipitation measurement during one year are evaluated. Precipitation measurements from stations are compared to a merged radar-gauge precipitation product. Disregarding sudden jumps in station measured precipitation, the accumulative rainfall over time in most stations showed an underestimation of rainfall compared to the accumulative values found in the corresponding radar pixel of the reference. Special consideration is given to the identification of faulty measurements without the need to obtain additional meta-data, such as setup and surroundings. This validation will show the potential of crowdsourced automatic weather stations for future urban rainfall monitoring.
Efforts to integrate CMIP metadata and standards into NOAA-GFDL's climate model workflow

NASA Astrophysics Data System (ADS)

Blanton, C.; Lee, M.; Mason, E. E.; Radhakrishnan, A.

2017-12-01

Modeling centers participating in CMIP6 run model simulations, publish requested model output (conforming to community data standards), and document models and simulations using ES-DOC. GFDL developed workflow software implementing some best practices to meet these metadata and documentation requirements. The CMIP6 Data Request defines the variables that should be archived for each experiment and specifies their spatial and temporal structure. We used the Data Request's dreqPy python library to write GFDL model configuration files as an alternative to hand-crafted tables. There was also a largely successful effort to standardize variable names within the model to reduce the additional overhead of translating "GFDL to CMOR" variables at a later stage in the pipeline. The ES-DOC ecosystem provides tools and standards to create, publish, and view various types of community-defined CIM documents, most notably model and simulation documents. Although ES-DOC will automatically create simulation documents during publishing by harvesting NetCDF global attributes, the information must be collected, stored, and placed in the NetCDF files by the workflow. We propose to develop a GUI to collect the simulation document precursors. In addition, a new MIP for CMIP6-CPMIP, a comparison of computational performance of climate models-is documented using machine and performance CIM documents. We used ES-DOC's pyesdoc python library to automatically create these machine and performance documents. We hope that these and similar efforts will become permanent features of the GFDL workflow to facilitate future participation in CMIP-like activities.
Service architecture challenges in building the KNMI Data Centre

NASA Astrophysics Data System (ADS)

Som de Cerff, Wim; van de Vegte, John; Plieger, Maarten; de Vreede, Ernst; Sluiter, Raymond; Willem Noteboom, Jan; van der Neut, Ian; Verhoef, Hans; van Versendaal, Robert; van Binnendijk, Martin; Kalle, Henk; Knopper, Arthur; Calis, Gijs; Ha, Siu Siu; van Moosel, WIm; Klein Ikkink, Henk-Jan; Tosun, Tuncay

2013-04-01

One of the objectives of KNMI is to act as a National Data centre for weather, climate and seismological data. KNMI has experience in curation of data for many years however important scientific data is not well accessible. New technologies also are available to improve the current infrastructure. Therefore a data curation program is initiated with two main goals: setup a Satellite Data Platform (SDP) and a KNMI data centre (KDC). KDC will provide, besides curation, data access, and storage and retrieval portal for KNMI data. In 2010 first requirements were gathered, in 2011 the main architecture was sketched, KDC was implemented in 2012 and is available on: http://data.knmi.nl KDC is built with the data providers involved with as key challenge: 'adding a dataset should be as simple as creating an HTML page'. This is enabled by a three step process, in which the data provider is responsible for two steps: 1. Provide dataset metadata: An easy to use web interface for providing metadata, with automated validation. Metadata consists of an ISO 19115 profile (matching INSPIRE and WMO requirements) and additional technical metadata regarding the data structure and access rights to the data. The interface hides certain metadata fields, which are filed by KDC automatically. 2. Provide data: after metadata has been entered, an upload location for uploading the dataset is provided. Also scripts for pushing large datasets are available. 3. Process and publish: once files are uploaded, they are processed for metadata (e.g., geolocation, time, version) and made available in KDC. The data is put into archive and made available using the in-house developed Virtual File System, which provides a persistent virtual path to the data. For the end-user of the data, KDC provides a web interface with search filters on key words, geolocation and time. Data can be downloaded using HTTP or FTP and can be scripted. Users can register to gain access to restricted datasets. The architecture combines Open Source software components (e.g. Geonetwork, Magnolia, MongoDB, MySQL) with in-house built software (ADAGUC, NADC) and newly developed software. Challenges faced and solved are: How to deal with the different file formats used at KNMI? (e.g. NetCDF, GRIB, BUFR, ASCII); How to deal with the different metadata profiles while hiding the complexity of this to the user? How to incorporate the existing archives? KDC is a node in several networks (WMO WIS, INSPIRE, Open Data): how to do this? In the presentation/poster we will describe what has been done for each of these challenges and how it is implemented in KDC.
Integrated workflows for spiking neuronal network simulations

PubMed Central

Antolík, Ján; Davison, Andrew P.

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages. PMID:24368902
New Solutions for Enabling Discovery of User-Centric Virtual Data Products in NASA's Common Metadata Repository

NASA Astrophysics Data System (ADS)

Pilone, D.; Gilman, J.; Baynes, K.; Shum, D.

2015-12-01

This talk introduces a new NASA Earth Observing System Data and Information System (EOSDIS) capability to automatically generate and maintain derived, Virtual Product information allowing DAACs and Data Providers to create tailored and more discoverable variations of their products. After this talk the audience will be aware of the new EOSDIS Virtual Product capability, applications of it, and how to take advantage of it. Much of the data made available in the EOSDIS are organized for generation and archival rather than for discovery and use. The EOSDIS Common Metadata Repository (CMR) is launching a new capability providing automated generation and maintenance of user-oriented Virtual Product information. DAACs can easily surface variations on established data products tailored to specific uses cases and users, leveraging DAAC exposed services such as custom ordering or access services like OPeNDAP for on-demand product generation and distribution. Virtual Data Products enjoy support for spatial and temporal information, keyword discovery, association with imagery, and are fully discoverable by tools such as NASA Earthdata Search, Worldview, and Reverb. Virtual Product generation has applicability across many use cases: - Describing derived products such as Surface Kinetic Temperature information (AST_08) from source products (ASTER L1A) - Providing streamlined access to data products (e.g. AIRS) containing many (>800) data variables covering an enormous variety of physical measurements - Attaching additional EOSDIS offerings such as Visual Metadata, external services, and documentation metadata - Publishing alternate formats for a product (e.g. netCDF for HDF products) with the actual conversion happening on request - Publishing granules to be modified by on-the-fly services, like GES-DISC's Data Quality Screening Service - Publishing "bundled" products where granules from one product correspond to granules from one or more other related products
Integrated workflows for spiking neuronal network simulations.

PubMed

Antolík, Ján; Davison, Andrew P

2013-01-01

The increasing availability of computational resources is enabling more detailed, realistic modeling in computational neuroscience, resulting in a shift toward more heterogeneous models of neuronal circuits, and employment of complex experimental protocols. This poses a challenge for existing tool chains, as the set of tools involved in a typical modeler's workflow is expanding concomitantly, with growing complexity in the metadata flowing between them. For many parts of the workflow, a range of tools is available; however, numerous areas lack dedicated tools, while integration of existing tools is limited. This forces modelers to either handle the workflow manually, leading to errors, or to write substantial amounts of code to automate parts of the workflow, in both cases reducing their productivity. To address these issues, we have developed Mozaik: a workflow system for spiking neuronal network simulations written in Python. Mozaik integrates model, experiment and stimulation specification, simulation execution, data storage, data analysis and visualization into a single automated workflow, ensuring that all relevant metadata are available to all workflow components. It is based on several existing tools, including PyNN, Neo, and Matplotlib. It offers a declarative way to specify models and recording configurations using hierarchically organized configuration files. Mozaik automatically records all data together with all relevant metadata about the experimental context, allowing automation of the analysis and visualization stages. Mozaik has a modular architecture, and the existing modules are designed to be extensible with minimal programming effort. Mozaik increases the productivity of running virtual experiments on highly structured neuronal networks by automating the entire experimental cycle, while increasing the reliability of modeling studies by relieving the user from manual handling of the flow of metadata between the individual workflow stages.
The Materials Data Facility: Data Services to Advance Materials Science Research

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blaiszik, B.; Chard, K.; Pruyne, J.

2016-07-06

With increasingly strict data management requirements from funding agencies and institutions, expanding focus on the challenges of research replicability, and growing data sizes and heterogeneity, new data needs are emerging in the materials community. The materials data facility (MDF) operates two cloudhosted services, data publication and data discovery, with features to promote open data sharing, self-service data publication and curation, and encourage data reuse, layered with powerful data discovery tools. The data publication service simplifies the process of copying data to a secure storage location, assigning data a citable persistent identifier, and recording custom (e.g., material, technique, or instrument specific)andmore » automatically-extractedmetadata in a registrywhile the data discovery service will provide advanced search capabilities (e.g., faceting, free text range querying, and full text search) against the registered data and metadata. TheMDF services empower individual researchers, research projects, and institutions to (I) publish research datasets, regardless of size, from local storage, institutional data stores, or cloud storage, without involvement of thirdparty publishers; (II) build, share, and enforce extensible domain-specific custom metadata schemas; (III) interact with published data and metadata via representational state transfer (REST) application program interfaces (APIs) to facilitate automation, analysis, and feedback; and (IV) access a data discovery model that allows researchers to search, interrogate, and eventually build on existing published data. We describe MDF’s design, current status, and future plans.« less
XML at the ADC: Steps to a Next Generation Data Archive

NASA Astrophysics Data System (ADS)

Shaya, E.; Blackwell, J.; Gass, J.; Oliversen, N.; Schneider, G.; Thomas, B.; Cheung, C.; White, R. A.

1999-05-01

The eXtensible Markup Language (XML) is a document markup language that allows users to specify their own tags, to create hierarchical structures to qualify their data, and to support automatic checking of documents for structural validity. It is being intensively supported by nearly every major corporate software developer. Under the funds of a NASA AISRP proposal, the Astronomical Data Center (ADC, http://adc.gsfc.nasa.gov) is developing an infrastructure for importation, enhancement, and distribution of data and metadata using XML as the document markup language. We discuss the preliminary Document Type Definition (DTD, at http://adc.gsfc.nasa.gov/xml) which specifies the elements and their attributes in our metadata documents. This attempts to define both the metadata of an astronomical catalog and the `header' information of an astronomical table. In addition, we give an overview of the planned flow of data through automated pipelines from authors and journal presses into our XML archive and retrieval through the web via the XML-QL Query Language and eXtensible Style Language (XSL) scripts. When completed, the catalogs and journal tables at the ADC will be tightly hyperlinked to enhance data discovery. In addition one will be able to search on fragmentary information. For instance, one could query for a table by entering that the second author is so-and-so or that the third author is at such-and-such institution.
Semantic-Aware Components and Services of ActiveMath

ERIC Educational Resources Information Center

Melis, Erica; Goguadze, Giorgi; Homik, Martin; Libbrecht, Paul; Ullrich, Carsten; Winterstein, Stefan

2006-01-01

ActiveMath is a complex web-based adaptive learning environment with a number of components and interactive learning tools. The basis for handling semantics of learning content is provided by its semantic (mathematics) content markup, which is additionally annotated with educational metadata. Several components, tools and external services can…
Life+ EnvEurope DEIMS - improving access to long-term ecosystem monitoring data in Europe

NASA Astrophysics Data System (ADS)

Kliment, Tomas; Peterseil, Johannes; Oggioni, Alessandro; Pugnetti, Alessandra; Blankman, David

2013-04-01

Long-term ecological (LTER) studies aim at detecting environmental changes and analysing its related drivers. In this respect LTER Europe provides a network of about 450 sites and platforms. However, data on various types of ecosystems and at a broad geographical scale is still not easily available. Managing data resulting from long-term observations is therefore one of the important tasks not only for an LTER site itself but also on the network level. Exchanging and sharing the information within a wider community is a crucial objective in the upcoming years. Due to the fragmented nature of long-term ecological research and monitoring (LTER) in Europe - and also on the global scale - information management has to face several challenges: distributed data sources, heterogeneous data models, heterogeneous data management solutions and the complex domain of ecosystem monitoring with regard to the resulting data. The Life+ EnvEurope project (2010-2013) provides a case study for a workflow using data from the distributed network of LTER-Europe sites. In order to enhance discovery, evaluation and access to data, the EnvEurope Drupal Ecological Information Management System (DEIMS) has been developed. This is based on the first official release of the Drupal metadata editor developed by US LTER. EnvEurope DEIMS consists of three main components: 1) Metadata editor: a web-based client interface to manage metadata of three information resource types - datasets, persons and research sites. A metadata model describing datasets based on Ecological Metadata Language (EML) was developed within the initial phase of the project. A crosswalk to the INSPIRE metadata model was implemented to convey to the currently on-going European activities. Person and research site metadata models defined within the LTER Europe were adapted for the project needs. The three metadata models are interconnected within the system in order to provide easy way to navigate the user among the related resources. 2) Discovery client: provides several search profiles for datasets, persons, research sites and external resources commonly used in the domain, e.g. Catalogue of Life , based on several search patterns ranging from simple full text search, glossary browsing to categorized faceted search. 3) Geo-Viewer: a map client that portrays boundaries and centroids of the research sites as Web Map Service (WMS) layers. Each layer provides a link to both Metadata editor and Discovery client in order to create or discover metadata describing the data collected within the individual research site. Sharing of the dataset metadata with DEIMS is ensured in two ways: XML export of individual metadata records according to the EML schema for inclusion in the international DataOne network, and periodic harvesting of metadata into GeoNetwork catalogue, thus providing catalogue service for web (CSW), which can be invoked by remote clients. The final version of DEIMS will be a pilot implementation for the information system of LTER-Europe, which should establish a common information management framework within the European ecosystem research domain and provide valuable environmental information to other European information infrastructures as SEIS, Copernicus and INSPIRE.
Building Petascale Cyberinfrastructure and Science Support for Solar Physics: Approach of the DKIST Data Center

NASA Astrophysics Data System (ADS)

Berukoff, Steven; Reardon, Kevin; Hays, Tony; Spiess, DJ; Watson, Fraser

2015-08-01

When construction is complete in 2019, the Daniel K. Inouye Solar Telescope will be the most-capable large aperture, high-resolution, multi-instrument solar physics facility in the world. The telescope is designed as a four-meter off-axis Gregorian, with a rotating Coude laboratory designed to simultaneously house and support five first-light imaging and spectropolarimetric instruments. At current design, the facility and its instruments will generate data volumes of 5 PB, produce 108 images, and 107-109 metadata elements annually. This data will not only forge new understanding of solar phenomena at high resolution, but enhance participation in solar physics and further grow a small but vibrant international community.The DKIST Data Center is being designed to store, curate, and process this flood of information, while augmenting its value by providing association of science data and metadata to its acquisition and processing provenance. In early Operations, the Data Center will produce, by autonomous, semi-automatic, and manual means, quality-controlled and -assured calibrated data sets, closely linked to facility and instrument performance during the Operations lifecycle. These data sets will be made available to the community openly and freely, and software and algorithms made available through community repositories like Github for further collaboration and improvement.We discuss the current design and approach of the DKIST Data Center, describing the development cycle, early technology analysis and prototyping, and the roadmap ahead. In this budget-conscious era, a key design criterion is elasticity, the ability of the built system to adapt to changing work volumes, types, and the shifting scientific landscape, without undue cost or operational impact. We discuss our deep iterative development approach, the underappreciated challenges of calibrating ground-based solar data, the crucial integration of the Data Center within the larger Operations lifecycle, and how software and hardware support, intelligently deployed, will enable high-caliber solar physics research and community growth for the DKIST's 40-year lifespan.
A Metadata Management Framework for Collaborative Review of Science Data Products

NASA Astrophysics Data System (ADS)

Hart, A. F.; Cinquini, L.; Mattmann, C. A.; Thompson, D. R.; Wagstaff, K.; Zimdars, P. A.; Jones, D. L.; Lazio, J.; Preston, R. A.

2012-12-01

Data volumes generated by modern scientific instruments often preclude archiving the complete observational record. To compensate, science teams have developed a variety of "triage" techniques for identifying data of potential scientific interest and marking it for prioritized processing or permanent storage. This may involve multiple stages of filtering with both automated and manual components operating at different timescales. A promising approach exploits a fast, fully automated first stage followed by a more reliable offline manual review of candidate events. This hybrid approach permits a 24-hour rapid real-time response while also preserving the high accuracy of manual review. To support this type of second-level validation effort, we have developed a metadata-driven framework for the collaborative review of candidate data products. The framework consists of a metadata processing pipeline and a browser-based user interface that together provide a configurable mechanism for reviewing data products via the web, and capturing the full stack of associated metadata in a robust, searchable archive. Our system heavily leverages software from the Apache Object Oriented Data Technology (OODT) project, an open source data integration framework that facilitates the construction of scalable data systems and places a heavy emphasis on the utilization of metadata to coordinate processing activities. OODT provides a suite of core data management components for file management and metadata cataloging that form the foundation for this effort. The system has been deployed at JPL in support of the V-FASTR experiment [1], a software-based radio transient detection experiment that operates commensally at the Very Long Baseline Array (VLBA), and has a science team that is geographically distributed across several countries. Daily review of automatically flagged data is a shared responsibility for the team, and is essential to keep the project within its resource constraints. We describe the development of the platform using open source software, and discuss our experience deploying the system operationally. [1] R.B.Wayth,W.F.Brisken,A.T.Deller,W.A.Majid,D.R.Thompson, S. J. Tingay, and K. L. Wagstaff, "V-fastr: The vlba fast radio transients experiment," The Astrophysical Journal, vol. 735, no. 2, p. 97, 2011. Acknowledgement: This effort was supported by the Jet Propulsion Laboratory, managed by the California Institute of Technology under a contract with the National Aeronautics and Space Administration.
Targeted exploration and analysis of large cross-platform human transcriptomic compendia

PubMed Central

Zhu, Qian; Wong, Aaron K; Krishnan, Arjun; Aure, Miriam R; Tadych, Alicja; Zhang, Ran; Corney, David C; Greene, Casey S; Bongo, Lars A; Kristensen, Vessela N; Charikar, Moses; Li, Kai; Troyanskaya, Olga G.

2016-01-01

We present SEEK (http://seek.princeton.edu), a query-based search engine across very large transcriptomic data collections, including thousands of human data sets from almost 50 microarray and next-generation sequencing platforms. SEEK uses a novel query-level cross-validation-based algorithm to automatically prioritize data sets relevant to the query and a robust search approach to identify query-coregulated genes, pathways, and processes. SEEK provides cross-platform handling, multi-gene query search, iterative metadata-based search refinement, and extensive visualization-based analysis options. PMID:25581801
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management

PubMed Central

Katz, Jonathan E.

2018-01-01

Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date. PMID:29489825
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management.

PubMed

Donovan, Therese M; Katz, Jonathan E

2018-01-01

Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.
An asynchronous traversal engine for graph-based rich metadata management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Carns, Philip; Ross, Robert B.

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less

An asynchronous traversal engine for graph-based rich metadata management

DOE PAGES

Dai, Dong; Carns, Philip; Ross, Robert B.; ...

2016-06-23

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenancemore » data mining). Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a "level synchronous" breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however. In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Furthermore, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.« less
Unified Science Information Model for SoilSCAPE using the Mercury Metadata Search System

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Lu, Kefa; Palanisamy, Giri; Cook, Robert; Santhana Vannan, Suresh; Moghaddam, Mahta Clewley, Dan; Silva, Agnelo; Akbar, Ruzbeh

2013-12-01

SoilSCAPE (Soil moisture Sensing Controller And oPtimal Estimator) introduces a new concept for a smart wireless sensor web technology for optimal measurements of surface-to-depth profiles of soil moisture using in-situ sensors. The objective is to enable a guided and adaptive sampling strategy for the in-situ sensor network to meet the measurement validation objectives of spaceborne soil moisture sensors such as the Soil Moisture Active Passive (SMAP) mission. This work is being carried out at the University of Michigan, the Massachusetts Institute of Technology, University of Southern California, and Oak Ridge National Laboratory. At Oak Ridge National Laboratory we are using Mercury metadata search system [1] for building a Unified Information System for the SoilSCAPE project. This unified portal primarily comprises three key pieces: Distributed Search/Discovery; Data Collections and Integration; and Data Dissemination. Mercury, a Federally funded software for metadata harvesting, indexing, and searching would be used for this module. Soil moisture data sources identified as part of this activity such as SoilSCAPE and FLUXNET (in-situ sensors), AirMOSS (airborne retrieval), SMAP (spaceborne retrieval), and are being indexed and maintained by Mercury. Mercury would be the central repository of data sources for cal/val for soil moisture studies and would provide a mechanism to identify additional data sources. Relevant metadata from existing inventories such as ORNL DAAC, USGS Clearinghouse, ARM, NASA ECHO, GCMD etc. would be brought in to this soil-moisture data search/discovery module. The SoilSCAPE [2] metadata records will also be published in broader metadata repositories such as GCMD, data.gov. Mercury can be configured to provide a single portal to soil moisture information contained in disparate data management systems located anywhere on the Internet. Mercury is able to extract, metadata systematically from HTML pages or XML files using a variety of methods including OAI-PMH [3]. The Mercury search interface then allows users to perform simple, fielded, spatial and temporal searches across a central harmonized index of metadata. Mercury supports various metadata standards including FGDC, ISO-19115, DIF, Dublin-Core, Darwin-Core, and EML. This poster describes in detail how Mercury implements the Unified Science Information Model for Soil moisture data. References: [1]Devarakonda R., et al. Mercury: reusable metadata management, data discovery and access system. Earth Science Informatics (2010), 3(1): 87-94. [2]Devarakonda R., et al. Daymet: Single Pixel Data Extraction Tool. http://daymet.ornl.gov/singlepixel.html (2012). Last Accesses 10-01-2013 [3]Devarakonda R., et al. Data sharing and retrieval using OAI-PMH. Earth Science Informatics (2011), 4(1): 1-5.
Automatic Coregistration and orthorectification (ACRO) and subsequent mosaicing of NASA high-resolution imagery over the Mars MC11 quadrangle, using HRSC as a baseline

NASA Astrophysics Data System (ADS)

Sidiropoulos, Panagiotis; Muller, Jan-Peter; Watson, Gillian; Michael, Gregory; Walter, Sebastian

2018-02-01

This work presents the coregistered, orthorectified and mosaiced high-resolution products of the MC11 quadrangle of Mars, which have been processed using novel, fully automatic, techniques. We discuss the development of a pipeline that achieves fully automatic and parameter independent geometric alignment of high-resolution planetary images, starting from raw input images in NASA PDS format and following all required steps to produce a coregistered geotiff image, a corresponding footprint and useful metadata. Additionally, we describe the development of a radiometric calibration technique that post-processes coregistered images to make them radiometrically consistent. Finally, we present a batch-mode application of the developed techniques over the MC11 quadrangle to validate their potential, as well as to generate end products, which are released to the planetary science community, thus assisting in the analysis of Mars static and dynamic features. This case study is a step towards the full automation of signal processing tasks that are essential to increase the usability of planetary data, but currently, require the extensive use of human resources.
Automatic identification of comparative effectiveness research from Medline citations to support clinicians’ treatment information needs

PubMed Central

Zhang, Mingyuan; Fiol, Guilherme Del; Grout, Randall W.; Jonnalagadda, Siddhartha; Medlin, Richard; Mishra, Rashmi; Weir, Charlene; Liu, Hongfang; Mostafa, Javed; Fiszman, Marcelo

2014-01-01

Online knowledge resources such as Medline can address most clinicians’ patient care information needs. Yet, significant barriers, notably lack of time, limit the use of these sources at the point of care. The most common information needs raised by clinicians are treatment-related. Comparative effectiveness studies allow clinicians to consider multiple treatment alternatives for a particular problem. Still, solutions are needed to enable efficient and effective consumption of comparative effectiveness research at the point of care. Objective Design and assess an algorithm for automatically identifying comparative effectiveness studies and extracting the interventions investigated in these studies. Methods The algorithm combines semantic natural language processing, Medline citation metadata, and machine learning techniques. We assessed the algorithm in a case study of treatment alternatives for depression. Results Both precision and recall for identifying comparative studies was 0.83. A total of 86% of the interventions extracted perfectly or partially matched the gold standard. Conclusion Overall, the algorithm achieved reasonable performance. The method provides building blocks for the automatic summarization of comparative effectiveness research to inform point of care decision-making. PMID:23920677
A Folksonomy-Based Lightweight Resource Annotation Metadata Schema for Personalized Hypermedia Learning Resource Delivery

ERIC Educational Resources Information Center

Lau, Simon Boung-Yew; Lee, Chien-Sing; Singh, Yashwant Prasad

2015-01-01

With the proliferation of social Web applications, users can now collaboratively author, share and access hypermedia learning resources, contributing to richer learning experiences outside formal education. These resources may or may not be educational. However, they can be harnessed for educational purposes by adapting and personalizing them to…
The Lom Approach--a Call for Concern?

ERIC Educational Resources Information Center

Armitage, Nicholas; Bowerman, Chris

2005-01-01

The LOM (Learning Object Model) approach to courseware design seems to be driven by a desire to increase access to education as well as use technology to enable a higher staff-student ratio than is currently possible. The LOM standard involves the use of standard metadata descriptions of content and adaptive content engines to deliver the…
The LOM Approach -- A CALL for Concern?

ERIC Educational Resources Information Center

Armitage, Nicholas; Bowerman, Chris

2005-01-01

The LOM (Learning Object Model) approach to courseware design seems to be driven by a desire to increase access to education as well as use technology to enable a higher staff-student ratio than is currently possible. The LOM standard involves the use of standard metadata descriptions of content and adaptive content engines to deliver the…
A "Simple Query Interface" Adapter for the Discovery and Exchange of Learning Resources

ERIC Educational Resources Information Center

Massart, David

2006-01-01

Developed as part of CEN/ISSS Workshop on Learning Technology efforts to improve interoperability between learning resource repositories, the Simple Query Interface (SQI) is an Application Program Interface (API) for querying heterogeneous repositories of learning resource metadata. In the context of the ProLearn Network of Excellence, SQI is used…
Context-Adaptive Learning Designs by Using Semantic Web Services

ERIC Educational Resources Information Center

Dietze, Stefan; Gugliotta, Alessio; Domingue, John

2007-01-01

IMS Learning Design (IMS-LD) is a promising technology aimed at supporting learning processes. IMS-LD packages contain the learning process metadata as well as the learning resources. However, the allocation of resources--whether data or services--within the learning design is done manually at design-time on the basis of the subjective appraisals…
DataUp 2.0: Improving On a Tool For Helping Researchers Archive, Manage, and Share Their Tabular Data

NASA Astrophysics Data System (ADS)

Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.

2013-12-01

There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.
Estimating pediatric entrance skin dose from digital radiography examination using DICOM metadata: A quality assurance tool.

PubMed

Brady, S L; Kaufman, R A

2015-05-01

To develop an automated methodology to estimate patient examination dose in digital radiography (DR) imaging using DICOM metadata as a quality assurance (QA) tool. Patient examination and demographical information were gathered from metadata analysis of DICOM header data. The x-ray system radiation output (i.e., air KERMA) was characterized for all filter combinations used for patient examinations. Average patient thicknesses were measured for head, chest, abdomen, knees, and hands using volumetric images from CT. Backscatter factors (BSFs) were calculated from examination kVp. Patient entrance skin air KERMA (ESAK) was calculated by (1) looking up examination technique factors taken from DICOM header metadata (i.e., kVp and mA s) to derive an air KERMA (k air) value based on an x-ray characteristic radiation output curve; (2) scaling k air with a BSF value; and (3) correcting k air for patient thickness. Finally, patient entrance skin dose (ESD) was calculated by multiplying a mass-energy attenuation coefficient ratio by ESAK. Patient ESD calculations were computed for common DR examinations at our institution: dual view chest, anteroposterior (AP) abdomen, lateral (LAT) skull, dual view knee, and bone age (left hand only) examinations. ESD was calculated for a total of 3794 patients; mean age was 11 ± 8 yr (range: 2 months to 55 yr). The mean ESD range was 0.19-0.42 mGy for dual view chest, 0.28-1.2 mGy for AP abdomen, 0.18-0.65 mGy for LAT view skull, 0.15-0.63 mGy for dual view knee, and 0.10-0.12 mGy for bone age (left hand) examinations. A methodology combining DICOM header metadata and basic x-ray tube characterization curves was demonstrated. In a regulatory era where patient dose reporting has become increasingly in demand, this methodology will allow a knowledgeable user the means to establish an automatable dose reporting program for DR and perform patient dose related QA testing for digital x-ray imaging.
Estimating pediatric entrance skin dose from digital radiography examination using DICOM metadata: A quality assurance tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brady, S. L., E-mail: samuel.brady@stjude.org; Kaufman, R. A., E-mail: robert.kaufman@stjude.org

Purpose: To develop an automated methodology to estimate patient examination dose in digital radiography (DR) imaging using DICOM metadata as a quality assurance (QA) tool. Methods: Patient examination and demographical information were gathered from metadata analysis of DICOM header data. The x-ray system radiation output (i.e., air KERMA) was characterized for all filter combinations used for patient examinations. Average patient thicknesses were measured for head, chest, abdomen, knees, and hands using volumetric images from CT. Backscatter factors (BSFs) were calculated from examination kVp. Patient entrance skin air KERMA (ESAK) was calculated by (1) looking up examination technique factors taken frommore » DICOM header metadata (i.e., kVp and mA s) to derive an air KERMA (k{sub air}) value based on an x-ray characteristic radiation output curve; (2) scaling k{sub air} with a BSF value; and (3) correcting k{sub air} for patient thickness. Finally, patient entrance skin dose (ESD) was calculated by multiplying a mass–energy attenuation coefficient ratio by ESAK. Patient ESD calculations were computed for common DR examinations at our institution: dual view chest, anteroposterior (AP) abdomen, lateral (LAT) skull, dual view knee, and bone age (left hand only) examinations. Results: ESD was calculated for a total of 3794 patients; mean age was 11 ± 8 yr (range: 2 months to 55 yr). The mean ESD range was 0.19–0.42 mGy for dual view chest, 0.28–1.2 mGy for AP abdomen, 0.18–0.65 mGy for LAT view skull, 0.15–0.63 mGy for dual view knee, and 0.10–0.12 mGy for bone age (left hand) examinations. Conclusions: A methodology combining DICOM header metadata and basic x-ray tube characterization curves was demonstrated. In a regulatory era where patient dose reporting has become increasingly in demand, this methodology will allow a knowledgeable user the means to establish an automatable dose reporting program for DR and perform patient dose related QA testing for digital x-ray imaging.« less
MaNIDA: Integration of marine expedition information, data and publications: Data Portal of German Marine Research

NASA Astrophysics Data System (ADS)

Koppe, Roland; Scientific MaNIDA-Team

2013-04-01

The Marine Network for Integrated Data Access (MaNIDA) aims to build a sustainable e-infrastructure to support discovery and re-use of marine data from distinct data providers in Germany (see related abstracts in session ESSI 1.2). In order to provide users integrated access and retrieval of expedition or cruise metadata, data, services and publications as well as relationships among the various objects, we are developing (web) applications based on state of the art technologies: the Data Portal of German Marine Research. Since the German network of distributed content providers have distinct objectives and mandates for storing digital objects (e.g. long-term data preservation, near real time data, publication repositories), we have to cope with heterogeneous metadata in terms of syntax and semantic, data types and formats as well as access solutions. We have defined a set of core metadata elements which are common to our content providers and therefore useful for discovery and building relationships among objects. Existing catalogues for various types of vocabularies are being used to assure the mapping to community-wide used terms. We distinguish between expedition metadata and continuously harvestable metadata objects from distinct data providers. • Existing expedition metadata from distinct sources is integrated and validated in order to create an expedition metadata catalogue which is used as authoritative source for expedition-related content. The web application allows browsing by e.g. research vessel and date, exploring expeditions and research gaps by tracklines and viewing expedition details (begin/end, ports, platforms, chief scientists, events, etc.). Also expedition-related objects from harvesting are dynamically associated with expedition information and presented to the user. Hence we will provide web services to detailed expedition information. • Other harvestable content is separated into four categories: archived data and data products, near real time data, publications and reports. Reports are a special case of publication, describing cruise planning, cruise reports or popular reports on expeditions and are orthogonal to e.g. peer-reviewed articles. Each object's metadata contains at least: identifier(s) e.g. doi/hdl, title, author(s), date, expedition(s), platform(s) e.g. research vessel Polarstern. Furthermore project(s), parameter(s), device(s) and e.g. geographic coverage are of interest. An international gazetteer resolves geographic coverage to region names and annotates to object metadata. Information is homogenously presented to the user, independent of the underlying format, but adaptable to specific disciplines e.g. bathymetry. Also data access and dissemination information is available to the user as data download link or web services (e.g. WFS, WMS). Based on relationship metadata we are dynamically building graphs of objects to support the user in finding possible relevant associated objects. Technically metadata is based on ISO / OGC standards or provider specification. Metadata is harvested via OAI-PMH or OGC CSW and indexed with Apache Lucene. This enables powerful full-text search, geographic and temporal search as well as faceting. In this presentation we will illustrate the architecture and the current implementation of our integrated approach.
Automatic video shot boundary detection using k-means clustering and improved adaptive dual threshold comparison

NASA Astrophysics Data System (ADS)

Sa, Qila; Wang, Zhihui

2018-03-01

At present, content-based video retrieval (CBVR) is the most mainstream video retrieval method, using the video features of its own to perform automatic identification and retrieval. This method involves a key technology, i.e. shot segmentation. In this paper, the method of automatic video shot boundary detection with K-means clustering and improved adaptive dual threshold comparison is proposed. First, extract the visual features of every frame and divide them into two categories using K-means clustering algorithm, namely, one with significant change and one with no significant change. Then, as to the classification results, utilize the improved adaptive dual threshold comparison method to determine the abrupt as well as gradual shot boundaries.Finally, achieve automatic video shot boundary detection system.
The Automatic Integration of Folksonomies with Taxonomies Using Non-axiomatic Logic

NASA Astrophysics Data System (ADS)

Geldart, Joe; Cummins, Stephen

Cooperative tagging systems such as folksonomies are powerful tools when used to annotate information resources. The inherent power of folksonomies is in their ability to allow casual users to easily contribute ad hoc, yet meaningful, resource metadata without any specialist training. Older folksonomies have begun to degrade due to the lack of internal structure and from the use of many low quality tags. This chapter describes a remedy for some of the problems associated with folksonomies. We introduce a method of automatic integration and inference of the relationships between tags and resources in a folksonomy using non-axiomatic logic. We test this method on the CiteULike corpus of tags by comparing precision and recall between it and standard keyword search. Our results show that non-axiomatic reasoning is a promising technique for integrating tagging systems with more structured knowledge representations.
Allelic variation in Salmonella: an underappreciated driver of adaptation and virulence

PubMed Central

Yue, Min; Schifferli, Dieter M.

2014-01-01

Salmonella enterica causes substantial morbidity and mortality in humans and animals. Infection and intestinal colonization by S. enterica require virulence factors that mediate bacterial binding and invasion of enterocytes and innate immune cells. Some S. enterica colonization factors and their alleles are host restricted, suggesting a potential role in regulation of host specificity. Recent data also suggest that colonization factors promote horizontal gene transfer of antimicrobial resistance genes by increasing the local density of Salmonella in colonized intestines. Although a profusion of genes are involved in Salmonella pathogenesis, the relative importance of their allelic variation has only been studied intensely in the type 1 fimbrial adhesin FimH. Although other Salmonella virulence factors demonstrate allelic variation, their association with specific metadata (e.g., host species, disease or carrier state, time and geographic place of isolation, antibiotic resistance profile, etc.) remains to be interrogated. To date, genome-wide association studies (GWAS) in bacteriology have been limited by the paucity of relevant metadata. In addition, due to the many variables amid metadata categories, a very large number of strains must be assessed to attain statistically significant results. However, targeted approaches in which genes of interest (e.g., virulence factors) are specifically sequenced alleviates the time-consuming and costly statistical GWAS analysis and increases statistical power, as larger numbers of strains can be screened for non-synonymous single nucleotide polymorphisms (SNPs) that are associated with available metadata. Congruence of specific allelic variants with specific metadata from strains that have a relevant clinical and epidemiological history will help to prioritize functional wet-lab and animal studies aimed at determining cause-effect relationships. Such an approach should be applicable to other pathogens that are being collected in well-curated repositories. PMID:24454310
Method and system for spatial data input, manipulation and distribution via an adaptive wireless transceiver

NASA Technical Reports Server (NTRS)

Wang, Ray (Inventor)

2009-01-01

A method and system for spatial data manipulation input and distribution via an adaptive wireless transceiver. The method and system include a wireless transceiver for automatically and adaptively controlling wireless transmissions using a Waveform-DNA method. The wireless transceiver can operate simultaneously over both the short and long distances. The wireless transceiver is automatically adaptive and wireless devices can send and receive wireless digital and analog data from various sources rapidly in real-time via available networks and network services.
Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Saini, Subhash (Technical Monitor)

1998-01-01

In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the adaption phase of FE applications oil triangular meshes and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments oil EARTH-SP2, on implementation of EARTH on the IBM SP2 with different load balancing strategies that are built into the runtime system.
MEMOPS: data modelling and automatic code generation.

PubMed

Fogh, Rasmus H; Boucher, Wayne; Ionides, John M C; Vranken, Wim F; Stevens, Tim J; Laue, Ernest D

2010-03-25

In recent years the amount of biological data has exploded to the point where much useful information can only be extracted by complex computational analyses. Such analyses are greatly facilitated by metadata standards, both in terms of the ability to compare data originating from different sources, and in terms of exchanging data in standard forms, e.g. when running processes on a distributed computing infrastructure. However, standards thrive on stability whereas science tends to constantly move, with new methods being developed and old ones modified. Therefore maintaining both metadata standards, and all the code that is required to make them useful, is a non-trivial problem. Memops is a framework that uses an abstract definition of the metadata (described in UML) to generate internal data structures and subroutine libraries for data access (application programming interfaces--APIs--currently in Python, C and Java) and data storage (in XML files or databases). For the individual project these libraries obviate the need for writing code for input parsing, validity checking or output. Memops also ensures that the code is always internally consistent, massively reducing the need for code reorganisation. Across a scientific domain a Memops-supported data model makes it easier to support complex standards that can capture all the data produced in a scientific area, share them among all programs in a complex software pipeline, and carry them forward to deposition in an archive. The principles behind the Memops generation code will be presented, along with example applications in Nuclear Magnetic Resonance (NMR) spectroscopy and structural biology.
Building Interoperable Learning Objects Using Reduced Learning Object Metadata

ERIC Educational Resources Information Center

Saleh, Mostafa S.

2005-01-01

The new e-learning generation depends on Semantic Web technology to produce learning objects. As the production of these components is very costly, they should be produced and registered once, and reused and adapted in the same context or in other contexts as often as possible. To produce those components, developers should use learning standards…

Visualizing and Validating Metadata Traceability within the CDISC Standards.

PubMed

Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

2017-01-01

The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information.
Visualizing and Validating Metadata Traceability within the CDISC Standards

PubMed Central

Hume, Sam; Sarnikar, Surendra; Becnel, Lauren; Bennett, Dorine

2017-01-01

The Food & Drug Administration has begun requiring that electronic submissions of regulated clinical studies utilize the Clinical Data Information Standards Consortium data standards. Within regulated clinical research, traceability is a requirement and indicates that the analysis results can be traced back to the original source data. Current solutions for clinical research data traceability are limited in terms of querying, validation and visualization capabilities. This paper describes (1) the development of metadata models to support computable traceability and traceability visualizations that are compatible with industry data standards for the regulated clinical research domain, (2) adaptation of graph traversal algorithms to make them capable of identifying traceability gaps and validating traceability across the clinical research data lifecycle, and (3) development of a traceability query capability for retrieval and visualization of traceability information. PMID:28815125
Scaffolding and Integrated Assessment in Computer Assisted Learning (CAL) for Children with Learning Disabilities

ERIC Educational Resources Information Center

Beale, Ivan L.

2005-01-01

Computer assisted learning (CAL) can involve a computerised intelligent learning environment, defined as an environment capable of automatically, dynamically and continuously adapting to the learning context. One aspect of this adaptive capability involves automatic adjustment of instructional procedures in response to each learner's performance,…
NASA Reverb: Standards-Driven Earth Science Data and Service Discovery

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Mitchell, A.; Pilone, D.

2011-12-01

NASA's Earth Observing System Data and Information System (EOSDIS) is a core capability in NASA's Earth Science Data Systems Program. NASA's EOS ClearingHOuse (ECHO) is a metadata catalog for the EOSDIS, providing a centralized catalog of data products and registry of related data services. Working closely with the EOSDIS community, the ECHO team identified a need to develop the next generation EOS data and service discovery tool. This development effort relied on the following principles: + Metadata Driven User Interface - Users should be presented with data and service discovery capabilities based on dynamic processing of metadata describing the targeted data. + Integrated Data & Service Discovery - Users should be able to discovery data and associated data services that facilitate their research objectives. + Leverage Common Standards - Users should be able to discover and invoke services that utilize common interface standards. Metadata plays a vital role facilitating data discovery and access. As data providers enhance their metadata, more advanced search capabilities become available enriching a user's search experience. Maturing metadata formats such as ISO 19115 provide the necessary depth of metadata that facilitates advanced data discovery capabilities. Data discovery and access is not limited to simply the retrieval of data granules, but is growing into the more complex discovery of data services. These services include, but are not limited to, services facilitating additional data discovery, subsetting, reformatting, and re-projecting. The discovery and invocation of these data services is made significantly simpler through the use of consistent and interoperable standards. By utilizing an adopted standard, developing standard-specific adapters can be utilized to communicate with multiple services implementing a specific protocol. The emergence of metadata standards such as ISO 19119 plays a similarly important role in discovery as the 19115 standard. After a yearlong design, development, and testing process, the ECHO team successfully released "Reverb - The Next Generation Earth Science Discovery Tool." Reverb relies heavily on the information contained in dataset and granule metadata, such as ISO 19115, to provide a dynamic experience to users based on identified search facet values extracted from science metadata. Such an approach allows users to perform cross-dataset correlation and searches, discovering additional data that they may not previously have been aware of. In addition to data discovery, Reverb users may discover services associated with their data of interest. When services utilize supported standards and/or protocols, Reverb can facilitate the invocation of both synchronous and asynchronous data processing services. This greatly enhances a users ability to discover data of interest and accomplish their research goals. Extrapolating on the current movement towards interoperable standards and an increase in available services, data service invocation and chaining will become a natural part of data discovery. Reverb is one example of a discovery tool that provides a mechanism for transforming the earth science data discovery paradigm.
AMModels: An R package for storing models, data, and metadata to facilitate adaptive management

USGS Publications Warehouse

Donovan, Therese M.; Katz, Jonathan

2018-01-01

Agencies are increasingly called upon to implement their natural resource management programs within an adaptive management (AM) framework. This article provides the background and motivation for the R package, AMModels. AMModels was developed under R version 3.2.2. The overall goal of AMModels is simple: To codify knowledge in the form of models and to store it, along with models generated from numerous analyses and datasets that may come our way, so that it can be used or recalled in the future. AMModels facilitates this process by storing all models and datasets in a single object that can be saved to an .RData file and routinely augmented to track changes in knowledge through time. Through this process, AMModels allows the capture, development, sharing, and use of knowledge that may help organizations achieve their mission. While AMModels was designed to facilitate adaptive management, its utility is far more general. Many R packages exist for creating and summarizing models, but to our knowledge, AMModels is the only package dedicated not to the mechanics of analysis but to organizing analysis inputs, analysis outputs, and preserving descriptive metadata. We anticipate that this package will assist users hoping to preserve the key elements of an analysis so they may be more confidently revisited at a later date.
NDSI products system based on Hadoop platform

NASA Astrophysics Data System (ADS)

Zhou, Yan; Jiang, He; Yang, Xiaoxia; Geng, Erhui

2015-12-01

Snow is solid state of water resources on earth, and plays an important role in human life. Satellite remote sensing is significant in snow extraction with the advantages of cyclical, macro, comprehensiveness, objectivity, timeliness. With the continuous development of remote sensing technology, remote sensing data access to the trend of multiple platforms, multiple sensors and multiple perspectives. At the same time, in view of the remote sensing data of compute-intensive applications demand increase gradually. However, current the producing system of remote sensing products is in a serial mode, and this kind of production system is used for professional remote sensing researchers mostly, and production systems achieving automatic or semi-automatic production are relatively less. Facing massive remote sensing data, the traditional serial mode producing system with its low efficiency has been difficult to meet the requirements of mass data timely and efficient processing. In order to effectively improve the production efficiency of NDSI products, meet the demand of large-scale remote sensing data processed timely and efficiently, this paper build NDSI products production system based on Hadoop platform, and the system mainly includes the remote sensing image management module, NDSI production module, and system service module. Main research contents and results including: (1)The remote sensing image management module: includes image import and image metadata management two parts. Import mass basis IRS images and NDSI product images (the system performing the production task output) into HDFS file system; At the same time, read the corresponding orbit ranks number, maximum/minimum longitude and latitude, product date, HDFS storage path, Hadoop task ID (NDSI products), and other metadata information, and then create thumbnails, and unique ID number for each record distribution, import it into base/product image metadata database. (2)NDSI production module: includes the index calculation, production tasks submission and monitoring two parts. Read HDF images related to production task in the form of a byte stream, and use Beam library to parse image byte stream to the form of Product; Use MapReduce distributed framework to perform production tasks, at the same time monitoring task status; When the production task complete, calls remote sensing image management module to store NDSI products. (3)System service module: includes both image search and DNSI products download. To image metadata attributes described in JSON format, return to the image sequence ID existing in the HDFS file system; For the given MapReduce task ID, package several task output NDSI products into ZIP format file, and return to the download link (4)System evaluation: download massive remote sensing data and use the system to process it to get the NDSI products testing the performance, and the result shows that the system has high extendibility, strong fault tolerance, fast production speed, and the image processing results with high accuracy.
Overview of long-term field experiments in Germany - metadata visualization

NASA Astrophysics Data System (ADS)

Muqit Zoarder, Md Abdul; Heinrich, Uwe; Svoboda, Nikolai; Grosse, Meike; Hierold, Wilfried

2017-04-01

BonaRes ("soil as a sustainable resource for the bioeconomy") is conducting to collect data and metadata of agricultural long-term field experiments (LTFE) of Germany. It is funded by the German Federal Ministry of Education and Research (BMBF) under the umbrella of the National Research Strategy BioEconomy 2030. BonaRes consists of ten interdisciplinary research project consortia and the 'BonaRes - Centre for Soil Research'. BonaRes Data Centre is responsible for collecting all LTFE data and regarding metadata into an enterprise database upon higher level of security and visualization of the data and metadata through data portal. In the frame of the BonaRes project, we are compiling an overview of long-term field experiments in Germany that is based on a literature review, the results of the online survey and direct contacts with LTFE operators. Information about research topic, contact person, website, experiment setup and analyzed parameters are collected. Based on the collected LTFE data, an enterprise geodatabase is developed and a GIS-based web-information system about LTFE in Germany is also settled. Various aspects of the LTFE, like experiment type, land-use type, agricultural category and duration of experiment, are presented in thematic maps. This information system is dynamically linked to the database, which means changes in the data directly affect the presentation. An easy data searching option using LTFE name, -location or -operators and the dynamic layer selection ensure a user-friendly web application. Dispersion and visualization of the overlapping LTFE points on the overview map are also challenging and we make it automatized at very zoom level which is also a consistent part of this application. The application provides both, spatial location and meta-information of LTFEs, which is backed-up by an enterprise geodatabase, GIS server for hosting map services and Java script API for web application development.
Improving data management and dissemination in web based information systems by semantic enrichment of descriptive data aspects

NASA Astrophysics Data System (ADS)

Gebhardt, Steffen; Wehrmann, Thilo; Klinger, Verena; Schettler, Ingo; Huth, Juliane; Künzer, Claudia; Dech, Stefan

2010-10-01

The German-Vietnamese water-related information system for the Mekong Delta (WISDOM) project supports business processes in Integrated Water Resources Management in Vietnam. Multiple disciplines bring together earth and ground based observation themes, such as environmental monitoring, water management, demographics, economy, information technology, and infrastructural systems. This paper introduces the components of the web-based WISDOM system including data, logic and presentation tier. It focuses on the data models upon which the database management system is built, including techniques for tagging or linking metadata with the stored information. The model also uses ordered groupings of spatial, thematic and temporal reference objects to semantically tag datasets to enable fast data retrieval, such as finding all data in a specific administrative unit belonging to a specific theme. A spatial database extension is employed by the PostgreSQL database. This object-oriented database was chosen over a relational database to tag spatial objects to tabular data, improving the retrieval of census and observational data at regional, provincial, and local areas. While the spatial database hinders processing raster data, a "work-around" was built into WISDOM to permit efficient management of both raster and vector data. The data model also incorporates styling aspects of the spatial datasets through styled layer descriptions (SLD) and web mapping service (WMS) layer specifications, allowing retrieval of rendered maps. Metadata elements of the spatial data are based on the ISO19115 standard. XML structured information of the SLD and metadata are stored in an XML database. The data models and the data management system are robust for managing the large quantity of spatial objects, sensor observations, census and document data. The operational WISDOM information system prototype contains modules for data management, automatic data integration, and web services for data retrieval, analysis, and distribution. The graphical user interfaces facilitate metadata cataloguing, data warehousing, web sensor data analysis and thematic mapping.
Study on application of adaptive fuzzy control and neural network in the automatic leveling system

NASA Astrophysics Data System (ADS)

Xu, Xiping; Zhao, Zizhao; Lan, Weiyong; Sha, Lei; Qian, Cheng

2015-04-01

This paper discusses the adaptive fuzzy control and neural network BP algorithm in large flat automatic leveling control system application. The purpose is to develop a measurement system with a flat quick leveling, Make the installation on the leveling system of measurement with tablet, to be able to achieve a level in precision measurement work quickly, improve the efficiency of the precision measurement. This paper focuses on the automatic leveling system analysis based on fuzzy controller, Use of the method of combining fuzzy controller and BP neural network, using BP algorithm improve the experience rules .Construct an adaptive fuzzy control system. Meanwhile the learning rate of the BP algorithm has also been run-rate adjusted to accelerate convergence. The simulation results show that the proposed control method can effectively improve the leveling precision of automatic leveling system and shorten the time of leveling.
Disentangling Complexity in Bayesian Automatic Adaptive Quadrature

NASA Astrophysics Data System (ADS)

Adam, Gheorghe; Adam, Sanda

2018-02-01

The paper describes a Bayesian automatic adaptive quadrature (BAAQ) solution for numerical integration which is simultaneously robust, reliable, and efficient. Detailed discussion is provided of three main factors which contribute to the enhancement of these features: (1) refinement of the m-panel automatic adaptive scheme through the use of integration-domain-length-scale-adapted quadrature sums; (2) fast early problem complexity assessment - enables the non-transitive choice among three execution paths: (i) immediate termination (exceptional cases); (ii) pessimistic - involves time and resource consuming Bayesian inference resulting in radical reformulation of the problem to be solved; (iii) optimistic - asks exclusively for subrange subdivision by bisection; (3) use of the weaker accuracy target from the two possible ones (the input accuracy specifications and the intrinsic integrand properties respectively) - results in maximum possible solution accuracy under minimum possible computing time.
TR32DB - Management of Research Data in a Collaborative, Interdisciplinary Research Project

NASA Astrophysics Data System (ADS)

Curdt, Constanze; Hoffmeister, Dirk; Waldhoff, Guido; Lang, Ulrich; Bareth, Georg

2015-04-01

The management of research data in a well-structured and documented manner is essential in the context of collaborative, interdisciplinary research environments (e.g. across various institutions). Consequently, set-up and use of a research data management (RDM) system like a data repository or project database is necessary. These systems should accompany and support scientists during the entire research life cycle (e.g. data collection, documentation, storage, archiving, sharing, publishing) and operate cross-disciplinary in interdisciplinary research projects. Challenges and problems of RDM are well-know. Consequently, the set-up of a user-friendly, well-documented, sustainable RDM system is essential, as well as user support and further assistance. In the framework of the Transregio Collaborative Research Centre 32 'Patterns in Soil-Vegetation-Atmosphere Systems: Monitoring, Modelling, and Data Assimilation' (CRC/TR32), funded by the German Research Foundation (DFG), a RDM system was self-designed and implemented. The CRC/TR32 project database (TR32DB, www.tr32db.de) is operating online since early 2008. The TR32DB handles all data, which are created by the involved project participants from several institutions (e.g. Universities of Cologne, Bonn, Aachen, and the Research Centre Jülich) and research fields (e.g. soil and plant sciences, hydrology, geography, geophysics, meteorology, remote sensing). Very heterogeneous research data are considered, which are resulting from field measurement campaigns, meteorological monitoring, remote sensing, laboratory studies and modelling approaches. Furthermore, outcomes like publications, conference contributions, PhD reports and corresponding images are regarded. The TR32DB project database is set-up in cooperation with the Regional Computing Centre of the University of Cologne (RRZK) and also located in this hardware environment. The TR32DB system architecture is composed of three main components: (i) a file-based data storage including backup, (ii) a database-based storage for administrative data and metadata, and (iii) a web-interface for user access. The TR32DB offers common features of RDM systems. These include data storage, entry of corresponding metadata by a user-friendly input wizard, search and download of data depending on user permission, as well as secure internal exchange of data. In addition, a Digital Object Identifier (DOI) can be allocated for specific datasets and several web mapping components are supported (e.g. Web-GIS and map search). The centrepiece of the TR32DB is the self-provided and implemented CRC/TR32 specific metadata schema. This enables the documentation of all involved, heterogeneous data with accurate, interoperable metadata. The TR32DB Metadata Schema is set-up in a multi-level approach and supports several metadata standards and schemes (e.g. Dublin Core, ISO 19115, INSPIRE, DataCite). Furthermore, metadata properties with focus on the CRC/TR32 background (e.g. CRC/TR32 specific keywords) and the supported data types are complemented. Mandatory, optional and automatic metadata properties are specified. Overall, the TR32DB is designed and implemented according to the needs of the CRC/TR32 (e.g. huge amount of heterogeneous data) and demands of the DFG (e.g. cooperation with a computing centre). The application of a self-designed, project-specific, interoperable metadata schema enables the accurate documentation of all CRC/TR32 data. The implementation of the TR32DB in the hardware environment of the RRZK ensures the access to the data after the end of the CRC/TR32 funding in 2018.
Metadata-assisted nonuniform atmospheric scattering model of image haze removal for medium-altitude unmanned aerial vehicle

NASA Astrophysics Data System (ADS)

Liu, Chunlei; Ding, Wenrui; Li, Hongguang; Li, Jiankun

2017-09-01

Haze removal is a nontrivial work for medium-altitude unmanned aerial vehicle (UAV) image processing because of the effects of light absorption and scattering. The challenges are attributed mainly to image distortion and detail blur during the long-distance and large-scale imaging process. In our work, a metadata-assisted nonuniform atmospheric scattering model is proposed to deal with the aforementioned problems of medium-altitude UAV. First, to better describe the real atmosphere, we propose a nonuniform atmospheric scattering model according to the aerosol distribution, which directly benefits the image distortion correction. Second, considering the characteristics of long-distance imaging, we calculate the depth map, which is an essential clue to modeling, on the basis of UAV metadata information. An accurate depth map reduces the color distortion compared with the depth of field obtained by other existing methods based on priors or assumptions. Furthermore, we use an adaptive median filter to address the problem of fuzzy details caused by the global airlight value. Experimental results on both real flight and synthetic images demonstrate that our proposed method outperforms four other existing haze removal methods.
Distributed metadata servers for cluster file systems using shared low latency persistent key-value metadata store

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin; Pedone, Jr., James M.

A cluster file system is provided having a plurality of distributed metadata servers with shared access to one or more shared low latency persistent key-value metadata stores. A metadata server comprises an abstract storage interface comprising a software interface module that communicates with at least one shared persistent key-value metadata store providing a key-value interface for persistent storage of key-value metadata. The software interface module provides the key-value metadata to the at least one shared persistent key-value metadata store in a key-value format. The shared persistent key-value metadata store is accessed by a plurality of metadata servers. A metadata requestmore » can be processed by a given metadata server independently of other metadata servers in the cluster file system. A distributed metadata storage environment is also disclosed that comprises a plurality of metadata servers having an abstract storage interface to at least one shared persistent key-value metadata store.« less
Irregular and adaptive sampling for automatic geophysic measure systems

NASA Astrophysics Data System (ADS)

Avagnina, Davide; Lo Presti, Letizia; Mulassano, Paolo

2000-07-01

In this paper a sampling method, based on an irregular and adaptive strategy, is described. It can be used as automatic guide for rovers designed to explore terrestrial and planetary environments. Starting from the hypothesis that a explorative vehicle is equipped with a payload able to acquire measurements of interesting quantities, the method is able to detect objects of interest from measured points and to realize an adaptive sampling, while badly describing the not interesting background.
The Gemini Recipe System: A Dynamic Workflow for Automated Data Reduction

NASA Astrophysics Data System (ADS)

Labrie, K.; Hirst, P.; Allen, C.

2011-07-01

Gemini's next generation data reduction software suite aims to offer greater automation of the data reduction process without compromising the flexibility required by science programs using advanced or unusual observing strategies. The Recipe System is central to our new data reduction software. Developed in Python, it facilitates near-real time processing for data quality assessment, and both on- and off-line science quality processing. The Recipe System can be run as a standalone application or as the data processing core of an automatic pipeline. Building on concepts that originated in ORAC-DR, a data reduction process is defined in a Recipe written in a science (as opposed to computer) oriented language, and consists of a sequence of data reduction steps called Primitives. The Primitives are written in Python and can be launched from the PyRAF user interface by users wishing for more hands-on optimization of the data reduction process. The fact that the same processing Primitives can be run within both the pipeline context and interactively in a PyRAF session is an important strength of the Recipe System. The Recipe System offers dynamic flow control allowing for decisions regarding processing and calibration to be made automatically, based on the pixel and the metadata properties of the dataset at the stage in processing where the decision is being made, and the context in which the processing is being carried out. Processing history and provenance recording are provided by the AstroData middleware, which also offers header abstraction and data type recognition to facilitate the development of instrument-agnostic processing routines. All observatory or instrument specific definitions are isolated from the core of the AstroData system and distributed in external configuration packages that define a lexicon including classifications, uniform metadata elements, and transformations.
Earth Science Datacasting v2.0

NASA Technical Reports Server (NTRS)

Bingham, Andrew W.; Deen, Robert G.; Hussey, Kevin J.; Stough, Timothy M.; McCleese, Sean W.; Toole, Nicholas T.

2012-01-01

The Datacasting software, which consists of a server and a client, has been developed as part of the Earth Science (ES) Datacasting project. The goal of ES Datacasting is to provide scientists the ability to automatically and continuously download Earth science data that meets a precise, predefined need, and then to instantaneously visualize it on a local computer. This is achieved by applying the concept of podcasting to deliver science data over the Internet using RSS (Really Simple Syndication) XML feeds. By extending the RSS specification, scientists can filter a feed and only download the files that are required for a particular application (for example, only files that contain information about a particular event, such as a hurricane or flood). The extension also provides the ability for the client to understand the format of the data and visualize the information locally. The server part enables a data provider to create and serve basic Datacasting (RSS-based) feeds. The user can subscribe to any number of feeds, view the information related to each item contained within a feed (including browse pre-made images), manually download files associated with items, and place these files in a local store. The client-server architecture enables users to: a) Subscribe and interpret multiple Datacasting feeds (same look and feel as a typical mail client), b) Maintain a list of all items within each feed, c) Enable filtering on the lists based on different metadata attributes contained within the feed (list will reference only data files of interest), d) Visualize the reference data and associated metadata, e) Download files referenced within the list, and f) Automatically download files as new items become available.
Building an Internet of Samples: The Australian Contribution

NASA Astrophysics Data System (ADS)

Wyborn, Lesley; Klump, Jens; Bastrakova, Irina; Devaraju, Anusuriya; McInnes, Brent; Cox, Simon; Karssies, Linda; Martin, Julia; Ross, Shawn; Morrissey, John; Fraser, Ryan

2017-04-01

Physical samples are often the ground truth to research reported in the scientific literature across multiple domains. They are collected by many different entities (individual researchers, laboratories, government agencies, mining companies, citizens, museums, etc.). Samples must be curated over the long-term to ensure both that their existence is known, and to allow any data derived from them through laboratory and field tests to be linked to the physical samples. For example, having unique identifiers that link back ground truth data on the original sample helps calibrate large volumes of remotely sensed data. Access to catalogues of reliably identified samples from several collections promotes collaboration across all Earth Science disciplines. It also increases the cost effectiveness of research by reducing the need to re-collect samples in the field. The assignment of web identifiers to the digital representations of these physical objects allows us to link to data, literature, investigators and institutions, thus creating an "Internet of Samples". An Australian implementation of the "Internet of Samples" is using the IGSN (International Geo Sample Number, http://igsn.github.io) to identify samples in a globally unique and persistent way. IGSN was developed in the solid earth science community and is recommended for sample identification by the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS). IGSN is interoperable with other persistent identifier systems such as DataCite. Furthermore, the basic IGSN description metadata schema is compatible with existing schemas such as OGC Observations and Measurements (O&M) and DataCite Metadata Schema which makes crosswalks to other metadata schemas easy. IGSN metadata is disseminated through the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) allowing it to be aggregated in other applications such as portals (e.g. the Australian IGSN catalogue http://igsn2.csiro.au). The metadata is available in more than one format. The software for IGSN web services is based on components developed for DataCite and adapted to the specific requirements of IGSN. This cooperation in open source development ensures sustainable implementation and faster turnaround times for updates. IGSN, in particular in its Australian implementation, is characterised by a federated approach to system architecture and organisational governance giving it the necessary flexibility to adapt to particular local practices within multiple domains, whilst maintaining an overarching international standard. The three current IGSN allocation agents in Australia: Geoscience Australia, CSIRO and Curtin University, represent different sectors. Through funding from the Australian Research Data Services Program they have combined to develop a common web portal that allows discovery of physical samples and sample collections at a national level.International governance then ensures we can link to an international community but at the same time act locally to ensure the services offered are relevant to the needs of Australian researchers. This flexibility aids the integration of new disciplines into a global community of a physical samples information network.
SeaDataNet - Pan-European infrastructure for marine and ocean data management: Unified access to distributed data sets

NASA Astrophysics Data System (ADS)

Schaap, D. M. A.; Maudire, G.

2009-04-01

SeaDataNet is an Integrated research Infrastructure Initiative (I3) in EU FP6 (2006 - 2011) to provide the data management system adapted both to the fragmented observation system and the users need for an integrated access to data, meta-data, products and services. Therefore SeaDataNet insures the long term archiving of the large number of multidisciplinary data (i.e. temperature, salinity current, sea level, chemical, physical and biological properties) collected by many different sensors installed on board of research vessels, satellite and the various platforms of the marine observing system. The SeaDataNet project started in 2006, but builds upon earlier data management infrastructure projects, undertaken over a period of 20 years by an expanding network of oceanographic data centres from the countries around all European seas. Its predecessor project Sea-Search had a strict focus on metadata. SeaDataNet maintains significant interest in the further development of the metadata infrastructure, but its primary objective is the provision of easy data access and generic data products. SeaDataNet is a distributed infrastructure that provides transnational access to marine data, meta-data, products and services through 40 interconnected Trans National Data Access Platforms (TAP) from 35 countries around the Black Sea, Mediterranean, North East Atlantic, North Sea, Baltic and Arctic regions. These include: National Oceanographic Data Centres (NODC's) Satellite Data Centres. Furthermore the SeaDataNet consortium comprises a number of expert modelling centres, SME's experts in IT, and 3 international bodies (ICES, IOC and JRC). Planning: The SeaDataNet project is delivering and operating the infrastructure in 3 versions: Version 0: maintenance and further development of the metadata systems developed by the Sea-Search project plus the development of a new metadata system for indexing and accessing to individual data objects managed by the SeaDataNet data centres. This is known as the Common Data Index (CDI) V0 system Version 1: harmonisation and upgrading of the metadatabases through adoption of the ISO 19115 metadata standard and provision of transparent data access and download services from all partner data centres through upgrading the Common Data Index and deployment of a data object delivery service. Version 2: adding data product services and OGC compliant viewing services and further virtualisation of data access. SeaDataNet Version 0: The SeaDataNet portal has been set up at http://www.seadatanet.org and it provides a platform for all SeaDataNet services and standards as well as background information about the project and its partners. It includes discovery services via the following catalogues: CSR - Cruise Summary Reports of research vessels; EDIOS - Locations and details of monitoring stations and networks / programmes; EDMED - High level inventory of Marine Environmental Data sets collected and managed by research institutes and organisations; EDMERP - Marine Environmental Research Projects ; EDMO - Marine Organisations. These catalogues are interrelated, where possible, to facilitate cross searching and context searching. These catalogues connect to the Common Data Index (CDI). Common Data Index (CDI) The CDI gives detailed insight in available datasets at partners databases and paves the way to direct online data access or direct online requests for data access / data delivery. The CDI V0 metadatabase contains more than 340.000 individual data entries from 36 CDI partners from 29 countries across Europe, covering a broad scope and range of data, held by these organisations. For purposes of standardisation and international exchange the ISO19115 metadata standard has been adopted. The CDI format is defined as a dedicated subset of this standard. A CDI XML format supports the exchange between CDI-partners and the central CDI manager, and ensures interoperability with other systems and networks. CDI XML entries are generated by participating data centres, directly from their databases. CDI-partners can make use of dedicated SeaDataNet Tools to generate CDI XML files automatically. Approach for SeaDataNet V1 and V2: The approach for SeaDataNet V1 and V2, which is in line with the INSPIRE Directive, comprises the following services: Discovery services = Metadata directories Security services = Authentication, Authorization & Accounting (AAA) Delivery services = Data access & downloading of datasets Viewing services = Visualisation of metadata, data and data products Product services = Generic and standard products Monitoring services = Statistics on usage and performance of the system Maintenance services = Updating of metadata by SeaDataNet partners The services will be operated over a distributed network of interconnected Data Centres accessed through a central Portal. In addition to service access the portal will provide information on data management standards, tools and protocols. The architecture has been designed to provide a coherent system based on V1 services, whilst leaving the pathway open for later extension with V2 services. For the implementation, a range of technical components have been defined. Some are already operational with the remainder in the final stages of development and testing. These make use of recent web technologies, and also comprise Java components, to provide multi-platform support and syntactic interoperability. To facilitate sharing of resources and interoperability, SeaDataNet has adopted SOAP Web Service technology. The SeaDataNet architecture and components have been designed to handle all kinds of oceanographic and marine environmental data including both in-situ measurements and remote sensing observations. The V1 technical development is ready and the V1 system is now being implemented and adopted by all participating data centres in SeaDataNet. Interoperability: Interoperability is the key to distributed data management system success and it is achieved in SeaDataNet V1 by: Using common quality control protocols and flag scale Using controlled vocabularies from a single source that have been developed using international content governance Adopting the ISO 19115 metadata standard for all metadata directories Providing XML Validation Services to quality control the metadata maintenance, including field content verification based on Schematron. Providing standard metadata entry tools Using harmonised Data Transport Formats (NetCDF, ODV ASCII and MedAtlas ASCII) for data sets delivery Adopting of OGC standards for mapping and viewing services Using SOAP Web Services in the SeaDataNet architecture SeaDataNet V1 Delivery Services: An important objective of the V1 system is to provide transparent access to the distributed data sets via a unique user interface at the SeaDataNet portal and download service. In the SeaDataNet V1 architecture the Common Data Index (CDI) V1 provides the link between discovery and delivery. The CDI user interface enables users to have a detailed insight of the availability and geographical distribution of marine data, archived at the connected data centres, and it provides the means for downloading data sets in common formats via a transaction mechanism. The SeaDataNet portal provides registered users access to these distributed data sets via the CDI V1 Directory and a shopping basket mechanism. This allows registered users to locate data of interest and submit their data requests. The requests are forwarded automatically from the portal to the relevant SeaDataNet data centres. This process is controlled via the Request Status Manager (RSM) Web Service at the portal and a Download Manager (DM) java software module, implemented at each of the data centres. The RSM also enables registered users to check regularly the status of their requests and download data sets, after access has been granted. Data centres can follow all transactions for their data sets online and can handle requests which require their consent. The actual delivery of data sets is done between the user and the selected data centre. The CDI V1 system is now being populated by all participating data centres in SeaDataNet, thereby phasing out CDI V0. 0.1 SeaDataNet Partners: IFREMER (France), MARIS (Netherlands), HCMR/HNODC (Greece), ULg (Belgium), OGS (Italy), NERC/BODC (UK), BSH/DOD (Germany), SMHI (Sweden), IEO (Spain), RIHMI/WDC (Russia), IOC (International), ENEA (Italy), INGV (Italy), METU (Turkey), CLS (France), AWI (Germany), IMR (Norway), NERI (Denmark), ICES (International), EC-DG JRC (International), MI (Ireland), IHPT (Portugal), RIKZ (Netherlands), RBINS/MUMM (Belgium), VLIZ (Belgium), MRI (Iceland), FIMR (Finland ), IMGW (Poland), MSI (Estonia), IAE/UL (Latvia), CMR (Lithuania), SIO/RAS (Russia), MHI/DMIST (Ukraine), IO/BAS (Bulgaria), NIMRD (Romania), TSU (Georgia), INRH (Morocco), IOF (Croatia), PUT (Albania), NIB (Slovenia), UoM (Malta), OC/UCY (Cyprus), IOLR (Israel), NCSR/NCMS (Lebanon), CNR-ISAC (Italy), ISMAL (Algeria), INSTM (Tunisia)
An Open Catalog for Supernova Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guillochon, James; Parrent, Jerod; Kelley, Luke Zoltan

We present the Open Supernova Catalog , an online collection of observations and metadata for presently 36,000+ supernovae and related candidates. The catalog is freely available on the web (https://sne.space), with its main interface having been designed to be a user-friendly, rapidly searchable table accessible on desktop and mobile devices. In addition to the primary catalog table containing supernova metadata, an individual page is generated for each supernova, which displays its available metadata, light curves, and spectra spanning X-ray to radio frequencies. The data presented in the catalog is automatically rebuilt on a daily basis and is constructed by parsingmore » several dozen sources, including the data presented in the supernova literature and from secondary sources such as other web-based catalogs. Individual supernova data is stored in the hierarchical, human- and machine-readable JSON format, with the entirety of each supernova’s data being contained within a single JSON file bearing its name. The setup we present here, which is based on open-source software maintained via git repositories hosted on github, enables anyone to download the entirety of the supernova data set to their home computer in minutes, and to make contributions of their own data back to the catalog via git. As the supernova data set continues to grow, especially in the upcoming era of all-sky synoptic telescopes, which will increase the total number of events by orders of magnitude, we hope that the catalog we have designed will be a valuable tool for the community to analyze both historical and contemporary supernovae.« less
An Open Catalog for Supernova Data

NASA Astrophysics Data System (ADS)

Guillochon, James; Parrent, Jerod; Kelley, Luke Zoltan; Margutti, Raffaella

2017-01-01

We present the Open Supernova Catalog, an online collection of observations and metadata for presently 36,000+ supernovae and related candidates. The catalog is freely available on the web (https://sne.space), with its main interface having been designed to be a user-friendly, rapidly searchable table accessible on desktop and mobile devices. In addition to the primary catalog table containing supernova metadata, an individual page is generated for each supernova, which displays its available metadata, light curves, and spectra spanning X-ray to radio frequencies. The data presented in the catalog is automatically rebuilt on a daily basis and is constructed by parsing several dozen sources, including the data presented in the supernova literature and from secondary sources such as other web-based catalogs. Individual supernova data is stored in the hierarchical, human- and machine-readable JSON format, with the entirety of each supernova’s data being contained within a single JSON file bearing its name. The setup we present here, which is based on open-source software maintained via git repositories hosted on github, enables anyone to download the entirety of the supernova data set to their home computer in minutes, and to make contributions of their own data back to the catalog via git. As the supernova data set continues to grow, especially in the upcoming era of all-sky synoptic telescopes, which will increase the total number of events by orders of magnitude, we hope that the catalog we have designed will be a valuable tool for the community to analyze both historical and contemporary supernovae.

Leveling up: enabling diverse users to locate and effectively use unfamiliar data sets through NCAR's Research Data Archive

NASA Astrophysics Data System (ADS)

Peng, G. S.

2016-12-01

Research necessarily expands upon the volume and variety of data used in prior work. Increasingly, investigators look outside their primary areas of expertise for data to incorporate into their research. Locating and using the data that they need, which may be described in terminology from other fields of science or be encoded in unfamiliar data formats, present often insurmountable barriers for potential users. As a data provider of a diverse collection of over 600 atmospheric and oceanic data sets (DS) (http://rda.ucar.edu), we seek to reduce or remove those barriers. Serving a broadening and increasing user base with fixed and finite resources requires automation. Our software harvests metadata descriptors about the data from the data files themselves. Data curators/subject matter experts augment the machine-generated metadata as needed. Metadata powers our data search tools. Users may search for data in a myriad of ways ranging from free text queries to GCMD keywords to faceted searches capable of narrowing down selections by specific criteria. Users are offered customized lists of DSs fitting their criteria with links to DS main information pages that provide detailed information about each DS. Where appropriate, they link to the NCAR Climate Data Guide for expert guidance about strengths and weaknesses of that particular DS. Once users find the data sets they need, we provide modular lessons for common data tasks. The lessons may be data tool install guides, data recipes, blog posts, or short YouTube videos. Rather than overloading users with reams of information, we provide targeted lessons when the user is most receptive, e.g. when they want to use data in an unfamiliar format. We add new material when we discover common points of confusion. Each educational resource is tagged with DS ID numbers so that they are automatically linked with the relevant DSs. How can data providers leverage the work of other data providers? Can a common tagging scheme for data education materials help us automatically share our data lessons? Research is at the frontier of knowledge. Questions from users seeking to create new uses for old data will not always be answerable with canned responses. Humans will remain in the loop. But we need automation and cross-center cooperation to reduce the areas spanned by "edge cases."
Log-less metadata management on metadata server for parallel file systems.

PubMed

Liao, Jianwei; Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally.
Log-Less Metadata Management on Metadata Server for Parallel File Systems

PubMed Central

Xiao, Guoqiang; Peng, Xiaoning

2014-01-01

This paper presents a novel metadata management mechanism on the metadata server (MDS) for parallel and distributed file systems. In this technique, the client file system backs up the sent metadata requests, which have been handled by the metadata server, so that the MDS does not need to log metadata changes to nonvolatile storage for achieving highly available metadata service, as well as better performance improvement in metadata processing. As the client file system backs up certain sent metadata requests in its memory, the overhead for handling these backup requests is much smaller than that brought by the metadata server, while it adopts logging or journaling to yield highly available metadata service. The experimental results show that this newly proposed mechanism can significantly improve the speed of metadata processing and render a better I/O data throughput, in contrast to conventional metadata management schemes, that is, logging or journaling on MDS. Besides, a complete metadata recovery can be achieved by replaying the backup logs cached by all involved clients, when the metadata server has crashed or gone into nonoperational state exceptionally. PMID:24892093
Quality Metadata Management for Geospatial Scientific Workflows: from Retrieving to Assessing with Online Tools

NASA Astrophysics Data System (ADS)

Leibovici, D. G.; Pourabdollah, A.; Jackson, M.

2011-12-01

Experts and decision-makers use or develop models to monitor global and local changes of the environment. Their activities require the combination of data and processing services in a flow of operations and spatial data computations: a geospatial scientific workflow. The seamless ability to generate, re-use and modify a geospatial scientific workflow is an important requirement but the quality of outcomes is equally much important [1]. Metadata information attached to the data and processes, and particularly their quality, is essential to assess the reliability of the scientific model that represents a workflow [2]. Managing tools, dealing with qualitative and quantitative metadata measures of the quality associated with a workflow, are, therefore, required for the modellers. To ensure interoperability, ISO and OGC standards [3] are to be adopted, allowing for example one to define metadata profiles and to retrieve them via web service interfaces. However these standards need a few extensions when looking at workflows, particularly in the context of geoprocesses metadata. We propose to fill this gap (i) at first through the provision of a metadata profile for the quality of processes, and (ii) through providing a framework, based on XPDL [4], to manage the quality information. Web Processing Services are used to implement a range of metadata analyses on the workflow in order to evaluate and present quality information at different levels of the workflow. This generates the metadata quality, stored in the XPDL file. The focus is (a) on the visual representations of the quality, summarizing the retrieved quality information either from the standardized metadata profiles of the components or from non-standard quality information e.g., Web 2.0 information, and (b) on the estimated qualities of the outputs derived from meta-propagation of uncertainties (a principle that we have introduced [5]). An a priori validation of the future decision-making supported by the outputs of the workflow once run, is then provided using the meta-propagated qualities, obtained without running the workflow [6], together with the visualization pointing out the need to improve the workflow with better data or better processes on the workflow graph itself. [1] Leibovici, DG, Hobona, G Stock, K Jackson, M (2009) Qualifying geospatial workfow models for adaptive controlled validity and accuracy. In: IEEE 17th GeoInformatics, 1-5 [2] Leibovici, DG, Pourabdollah, A (2010a) Workflow Uncertainty using a Metamodel Framework and Metadata for Data and Processes. OGC TC/PC Meetings, September 2010, Toulouse, France [3] OGC (2011) www.opengeospatial.org [4] XPDL (2008) Workflow Process Definition Interface - XML Process Definition Language.Workflow Management Coalition, Document WfMC-TC-1025, 2008 [5] Leibovici, DG Pourabdollah, A Jackson, M (2011) Meta-propagation of Uncertainties for Scientific Workflow Management in Interoperable Spatial Data Infrastructures. In: Proceedings of the European Geosciences Union (EGU2011), April 2011, Austria [6] Pourabdollah, A Leibovici, DG Jackson, M (2011) MetaPunT: an Open Source tool for Meta-Propagation of uncerTainties in Geospatial Processing. In: Proceedings of OSGIS2011, June 2011, Nottingham, UK
Automatic depth grading tool to successfully adapt stereoscopic 3D content to digital cinema and home viewing environments

NASA Astrophysics Data System (ADS)

Thébault, Cédric; Doyen, Didier; Routhier, Pierre; Borel, Thierry

2013-03-01

To ensure an immersive, yet comfortable experience, significant work is required during post-production to adapt the stereoscopic 3D (S3D) content to the targeted display and its environment. On the one hand, the content needs to be reconverged using horizontal image translation (HIT) so as to harmonize the depth across the shots. On the other hand, to prevent edge violation, specific re-convergence is required and depending on the viewing conditions floating windows need to be positioned. In order to simplify this time-consuming work we propose a depth grading tool that automatically adapts S3D content to digital cinema or home viewing environments. Based on a disparity map, a stereo point of interest in each shot is automatically evaluated. This point of interest is used for depth matching, i.e. to position the objects of interest of consecutive shots in a same plane so as to reduce visual fatigue. The tool adapts the re-convergence to avoid edge-violation, hyper-convergence and hyper-divergence. Floating windows are also automatically positioned. The method has been tested on various types of S3D content, and the results have been validated by a stereographer.
Composing Data Parallel Code for a SPARQL Graph Engine

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste

Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basicmore » graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.« less
Protocols for Scholarly Communication

NASA Astrophysics Data System (ADS)

Pepe, A.; Yeomans, J.

2007-10-01

CERN, the European Organization for Nuclear Research, has operated an institutional preprint repository for more than 10 years. The repository contains over 850,000 records of which more than 450,000 are full-text OA preprints, mostly in the field of particle physics, and it is integrated with the library's holdings of books, conference proceedings, journals and other grey literature. In order to encourage effective propagation and open access to scholarly material, CERN is implementing a range of innovative library services into its document repository: automatic keywording, reference extraction, collaborative management tools and bibliometric tools. Some of these services, such as user reviewing and automatic metadata extraction, could make up an interesting testbed for future publishing solutions and certainly provide an exciting environment for e-science possibilities. The future protocol for scientific communication should guide authors naturally towards OA publication, and CERN wants to help reach a full open access publishing environment for the particle physics community and related sciences in the next few years.
Intuitive web-based experimental design for high-throughput biomedical data.

PubMed

Friedrich, Andreas; Kenar, Erhan; Kohlbacher, Oliver; Nahnsen, Sven

2015-01-01

Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental design, including the conditions that are studied as well as information that might be interesting for failure analysis or further experiments in the future. In addition to the management of this information, means for an integrated design and interfaces for structured data annotation are urgently needed by researchers. Here, we propose a factor-based experimental design approach that enables scientists to easily create large-scale experiments with the help of a web-based system. We present a novel implementation of a web-based interface allowing the collection of arbitrary metadata. To exchange and edit information we provide a spreadsheet-based, humanly readable format. Subsequently, sample sheets with identifiers and metainformation for data generation facilities can be created. Data files created after measurement of the samples can be uploaded to a datastore, where they are automatically linked to the previously created experimental design model.
A spatial data handling system for retrieval of images by unrestricted regions of user interest

NASA Technical Reports Server (NTRS)

Dorfman, Erik; Cromp, Robert F.

1992-01-01

The Intelligent Data Management (IDM) project at NASA/Goddard Space Flight Center has prototyped an Intelligent Information Fusion System (IIFS), which automatically ingests metadata from remote sensor observations into a large catalog which is directly queryable by end-users. The greatest challenge in the implementation of this catalog was supporting spatially-driven searches, where the user has a possible complex region of interest and wishes to recover those images that overlap all or simply a part of that region. A spatial data management system is described, which is capable of storing and retrieving records of image data regardless of their source. This system was designed and implemented as part of the IIFS catalog. A new data structure, called a hypercylinder, is central to the design. The hypercylinder is specifically tailored for data distributed over the surface of a sphere, such as satellite observations of the Earth or space. Operations on the hypercylinder are regulated by two expert systems. The first governs the ingest of new metadata records, and maintains the efficiency of the data structure as it grows. The second translates, plans, and executes users' spatial queries, performing incremental optimization as partial query results are returned.
Metadata for Web Resources: How Metadata Works on the Web.

ERIC Educational Resources Information Center

Dillon, Martin

This paper discusses bibliographic control of knowledge resources on the World Wide Web. The first section sets the context of the inquiry. The second section covers the following topics related to metadata: (1) definitions of metadata, including metadata as tags and as descriptors; (2) metadata on the Web, including general metadata systems,…
Metadata Dictionary Database: A Proposed Tool for Academic Library Metadata Management

ERIC Educational Resources Information Center

Southwick, Silvia B.; Lampert, Cory

2011-01-01

This article proposes a metadata dictionary (MDD) be used as a tool for metadata management. The MDD is a repository of critical data necessary for managing metadata to create "shareable" digital collections. An operational definition of metadata management is provided. The authors explore activities involved in metadata management in…
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations.

PubMed

Martínez-Romero, Marcos; O'Connor, Martin J; Shankar, Ravi D; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L; Gevaert, Olivier; Graybeal, John; Musen, Mark A

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Fast and Accurate Metadata Authoring Using Ontology-Based Recommendations

PubMed Central

Martínez-Romero, Marcos; O’Connor, Martin J.; Shankar, Ravi D.; Panahiazar, Maryam; Willrett, Debra; Egyedi, Attila L.; Gevaert, Olivier; Graybeal, John; Musen, Mark A.

2017-01-01

In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository. PMID:29854196
Harvesting NASA's Common Metadata Repository (CMR)

NASA Technical Reports Server (NTRS)

Shum, Dana; Durbin, Chris; Norton, James; Mitchell, Andrew

2017-01-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Harvesting NASA's Common Metadata Repository

NASA Astrophysics Data System (ADS)

Shum, D.; Mitchell, A. E.; Durbin, C.; Norton, J.

2017-12-01

As part of NASA's Earth Observing System Data and Information System (EOSDIS), the Common Metadata Repository (CMR) stores metadata for over 30,000 datasets from both NASA and international providers along with over 300M granules. This metadata enables sub-second discovery and facilitates data access. While the CMR offers a robust temporal, spatial and keyword search functionality to the general public and international community, it is sometimes more desirable for international partners to harvest the CMR metadata and merge the CMR metadata into a partner's existing metadata repository. This poster will focus on best practices to follow when harvesting CMR metadata to ensure that any changes made to the CMR can also be updated in a partner's own repository. Additionally, since each partner has distinct metadata formats they are able to consume, the best practices will also include guidance on retrieving the metadata in the desired metadata format using CMR's Unified Metadata Model translation software.
Automatic meta-data collection of STP observation data

NASA Astrophysics Data System (ADS)

Ishikura, S.; Kimura, E.; Murata, K.; Kubo, T.; Shinohara, I.

2006-12-01

For the geo-science and the STP (Solar-Terrestrial Physics) studies, various observations have been done by satellites and ground-based observatories up to now. These data are saved and managed at many organizations, but no common procedure and rule to provide and/or share these data files. Researchers have felt difficulty in searching and analyzing such different types of data distributed over the Internet. To support such cross-over analyses of observation data, we have developed the STARS (Solar-Terrestrial data Analysis and Reference System). The STARS consists of client application (STARS-app), the meta-database (STARS- DB), the portal Web service (STARS-WS) and the download agent Web service (STARS DLAgent-WS). The STARS-DB includes directory information, access permission, protocol information to retrieve data files, hierarchy information of mission/team/data and user information. Users of the STARS are able to download observation data files without knowing locations of the files by using the STARS-DB. We have implemented the Portal-WS to retrieve meta-data from the meta-database. One reason we use the Web service is to overcome a variety of firewall restrictions which is getting stricter in recent years. Now it is difficult for the STARS client application to access to the STARS-DB by sending SQL query to obtain meta- data from the STARS-DB. Using the Web service, we succeeded in placing the STARS-DB behind the Portal- WS and prevent from exposing it on the Internet. The STARS accesses to the Portal-WS by sending the SOAP (Simple Object Access Protocol) request over HTTP. Meta-data is received as a SOAP Response. The STARS DLAgent-WS provides clients with data files downloaded from data sites. The data files are provided with a variety of protocols (e.g., FTP, HTTP, FTPS and SFTP). These protocols are individually selected at each site. The clients send a SOAP request with download request messages and receive observation data files as a SOAP Response with DIME-Attachment. By introducing the DLAgent-WS, we overcame the problem that the data management policies of each data site are independent. Another important issue to be overcome is how to collect the meta-data of observation data files. So far, STARS-DB managers have added new records to the meta-database and updated them manually. We have had a lot of troubles to maintain the meta-database because observation data are generated every day and the quantity of data files increases explosively. For that purpose, we have attempted to automate collection of the meta-data. In this research, we adopted the RSS 1.0 (RDF Site Summary) as a format to exchange meta-data in the STP fields. The RSS is an RDF vocabulary that provides a multipurpose extensible meta-data description and is suitable for syndication of meta-data. Most of the data in the present study are described in the CDF (Common Data Format), which is a self- describing data format. We have converted meta-information extracted from the CDF data files into RSS files. The program to generate the RSS files is executed on data site server once a day and the RSS files provide information of new data files. The RSS files are collected by RSS collection server once a day and the meta- data are stored in the STARS-DB.
Geophysical Event Casting: Assembling & Broadcasting Data Relevant to Events and Disasters

NASA Astrophysics Data System (ADS)

Manipon, G. M.; Wilson, B. D.

2012-12-01

Broadcast Atom feeds are already being used to publish metadata and support discovery of data collections, granules, and web services. Such data and service casting advertises the existence of new granules in a dataset and available services to access or transform data. Similarly, data and services relevant to studying topical geophysical events (earthquakes, hurricanes, etc.) or periodic/regional structures (El Nino, deep convection) can be broadcast by publishing new entries and links in a feed for that topic. By using the geoRSS conventions, the time and space location of the event (e.g. a moving hurricane track) is specified in the feed, along with science description, images, relevant data granules, and links to useful web services (e.g. OGC/WMS). The topic cast is used to assemble all of the relevant data/images as they come in, and publish the metadata (images, links, services) to a broad group of subscribers. All of the information in the feed is structured using standardized XML tags (e.g. georss for space & time, and tags to point to external data & services), and is thus machine-readable, which is an improvement over collecting ad hoc links on a wiki. We have created a software suite in python to generate such "event casts" when a geophysical event first happens, then update them with more information as it becomes available, and display them as an event album in a web browser. Figure 1 shows a snapshot of our Event Cast Browser displaying information from a set of casts about the hurricanes in the Western Pacific during the year 2011. The 19th cyclone is selected in the left panel, so the top right panels display the entries in that feed with metadata such as maximum wind speed, while the bottom right panel displays the hurricane track (positions every 12 hours) as KML in the Google Earth plug-in, where additional data/image layers from the feed can be turned on or off by the user. The software automatically converts (georss) space & time information to KML placemarks, and can also generate various KML visualizations for other data layers that are pointed to in the feed. The user can replay all of the data images as an animation over the several days as the cyclone develops. The goal of "event casting" is to standardize several metadata micro-formats and use them within Atom feeds to create a rich ecosystem of topical event data that can be automatically manipulated by scripts and many interfaces. For our event cast browser, the same code can display all kinds of casts, whether about hurricanes, fire, earthquakes, or even El Nino. The presentation will describe: the event cast format and its standard micro-formats, software to generate and augment casts, and the browser GUI with KML visualizations.;
Research on key technologies for data-interoperability-based metadata, data compression and encryption, and their application

NASA Astrophysics Data System (ADS)

Yu, Xu; Shao, Quanqin; Zhu, Yunhai; Deng, Yuejin; Yang, Haijun

2006-10-01

With the development of informationization and the separation between data management departments and application departments, spatial data sharing becomes one of the most important objectives for the spatial information infrastructure construction, and spatial metadata management system, data transmission security and data compression are the key technologies to realize spatial data sharing. This paper discusses the key technologies for metadata based on data interoperability, deeply researches the data compression algorithms such as adaptive Huffman algorithm, LZ77 and LZ78 algorithm, studies to apply digital signature technique to encrypt spatial data, which can not only identify the transmitter of spatial data, but also find timely whether the spatial data are sophisticated during the course of network transmission, and based on the analysis of symmetric encryption algorithms including 3DES,AES and asymmetric encryption algorithm - RAS, combining with HASH algorithm, presents a improved mix encryption method for spatial data. Digital signature technology and digital watermarking technology are also discussed. Then, a new solution of spatial data network distribution is put forward, which adopts three-layer architecture. Based on the framework, we give a spatial data network distribution system, which is efficient and safe, and also prove the feasibility and validity of the proposed solution.
Simplified Metadata Curation via the Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Pilone, D.

2015-12-01

The Metadata Management Tool (MMT) is the newest capability developed as part of NASA Earth Observing System Data and Information System's (EOSDIS) efforts to simplify metadata creation and improve metadata quality. The MMT was developed via an agile methodology, taking into account inputs from GCMD's science coordinators and other end-users. In its initial release, the MMT uses the Unified Metadata Model for Collections (UMM-C) to allow metadata providers to easily create and update collection records in the ISO-19115 format. Through a simplified UI experience, metadata curators can create and edit collections without full knowledge of the NASA Best Practices implementation of ISO-19115 format, while still generating compliant metadata. More experienced users are also able to access raw metadata to build more complex records as needed. In future releases, the MMT will build upon recent work done in the community to assess metadata quality and compliance with a variety of standards through application of metadata rubrics. The tool will provide users with clear guidance as to how to easily change their metadata in order to improve their quality and compliance. Through these features, the MMT allows data providers to create and maintain compliant and high quality metadata in a short amount of time.
Enriched Video Semantic Metadata: Authorization, Integration, and Presentation.

ERIC Educational Resources Information Center

Mu, Xiangming; Marchionini, Gary

2003-01-01

Presents an enriched video metadata framework including video authorization using the Video Annotation and Summarization Tool (VAST)-a video metadata authorization system that integrates both semantic and visual metadata-- metadata integration, and user level applications. Results demonstrated that the enriched metadata were seamlessly…

Automatic Organ Localization for Adaptive Radiation Therapy for Prostate Cancer

DTIC Science & Technology

2005-05-01

and provides a framework for task 3. Key Research Accomplishments "* Comparison of manual segmentation with our automatic method, using several...well as manual segmentations by a different rater. "* Computation of the actual cumulative dose delivered to both the cancerous and critical healthy...adaptive treatment of prostate or other cancer. As a result, all such work must be done manually . However, manual segmentation of the tumor and neighboring
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studwell, Sara; Robinson, Carly; Elliott, Jannean

Scientific research is producing ever-increasing amounts of data. Organizing and reflecting relationships across data collections, datasets, publications, and other research objects are essential functionalities of the modern science environment, yet challenging to implement. Landing pages are often used for providing ‘big picture’ contextual frameworks for datasets and data collections, and many large-volume data holders are utilizing them in thoughtful, creative ways. The benefits of their organizational efforts, however, are not realized unless the user eventually sees the landing page at the end point of their search. What if that organization and ‘big picture’ context could benefit the user at themore » beginning of the search? That is a challenging approach, but The Department of Energy’s (DOE) Office of Scientific and Technical Information (OSTI) is redesigning the database functionality of the DOE Data Explorer (DDE) with that goal in mind. Phase I is focused on redesigning the DDE database to leverage relationships between two existing distinct populations in DDE, data Projects and individual Datasets, and then adding a third intermediate population, data Collections. Mapped, structured linkages, designed to show user relationships, will allow users to make informed search choices. These linkages will be sustainable and scalable, created automatically with the use of new metadata fields and existing authorities. Phase II will study selected DOE Data ID Service clients, analyzing how their landing pages are organized, and how that organization might be used to improve DDE search capabilities. At the heart of both phases is the realization that adding more metadata information for cross-referencing may require additional effort for data scientists. Finally, OSTI’s approach seeks to leverage existing metadata and landing page intelligence without imposing an additional burden on the data creators.« less
Automatic identification of high impact articles in PubMed to support clinical decision making.

PubMed

Bian, Jiantao; Morid, Mohammad Amin; Jonnalagadda, Siddhartha; Luo, Gang; Del Fiol, Guilherme

2017-09-01

The practice of evidence-based medicine involves integrating the latest best available evidence into patient care decisions. Yet, critical barriers exist for clinicians' retrieval of evidence that is relevant for a particular patient from primary sources such as randomized controlled trials and meta-analyses. To help address those barriers, we investigated machine learning algorithms that find clinical studies with high clinical impact from PubMed®. Our machine learning algorithms use a variety of features including bibliometric features (e.g., citation count), social media attention, journal impact factors, and citation metadata. The algorithms were developed and evaluated with a gold standard composed of 502 high impact clinical studies that are referenced in 11 clinical evidence-based guidelines on the treatment of various diseases. We tested the following hypotheses: (1) our high impact classifier outperforms a state-of-the-art classifier based on citation metadata and citation terms, and PubMed's® relevance sort algorithm; and (2) the performance of our high impact classifier does not decrease significantly after removing proprietary features such as citation count. The mean top 20 precision of our high impact classifier was 34% versus 11% for the state-of-the-art classifier and 4% for PubMed's® relevance sort (p=0.009); and the performance of our high impact classifier did not decrease significantly after removing proprietary features (mean top 20 precision=34% vs. 36%; p=0.085). The high impact classifier, using features such as bibliometrics, social media attention and MEDLINE® metadata, outperformed previous approaches and is a promising alternative to identifying high impact studies for clinical decision support. Copyright © 2017 Elsevier Inc. All rights reserved.
Redesigning the DOE Data Explorer to embed dataset relationships at the point of search and to reflect landing page organization

DOE PAGES

Studwell, Sara; Robinson, Carly; Elliott, Jannean

2017-04-04

Scientific research is producing ever-increasing amounts of data. Organizing and reflecting relationships across data collections, datasets, publications, and other research objects are essential functionalities of the modern science environment, yet challenging to implement. Landing pages are often used for providing ‘big picture’ contextual frameworks for datasets and data collections, and many large-volume data holders are utilizing them in thoughtful, creative ways. The benefits of their organizational efforts, however, are not realized unless the user eventually sees the landing page at the end point of their search. What if that organization and ‘big picture’ context could benefit the user at themore » beginning of the search? That is a challenging approach, but The Department of Energy’s (DOE) Office of Scientific and Technical Information (OSTI) is redesigning the database functionality of the DOE Data Explorer (DDE) with that goal in mind. Phase I is focused on redesigning the DDE database to leverage relationships between two existing distinct populations in DDE, data Projects and individual Datasets, and then adding a third intermediate population, data Collections. Mapped, structured linkages, designed to show user relationships, will allow users to make informed search choices. These linkages will be sustainable and scalable, created automatically with the use of new metadata fields and existing authorities. Phase II will study selected DOE Data ID Service clients, analyzing how their landing pages are organized, and how that organization might be used to improve DDE search capabilities. At the heart of both phases is the realization that adding more metadata information for cross-referencing may require additional effort for data scientists. Finally, OSTI’s approach seeks to leverage existing metadata and landing page intelligence without imposing an additional burden on the data creators.« less
Assessing Metadata Quality of a Federally Sponsored Health Data Repository.

PubMed

Marc, David T; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn't frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality.
Assessing Metadata Quality of a Federally Sponsored Health Data Repository

PubMed Central

Marc, David T.; Beattie, James; Herasevich, Vitaly; Gatewood, Laël; Zhang, Rui

2016-01-01

The U.S. Federal Government developed HealthData.gov to disseminate healthcare datasets to the public. Metadata is provided for each datasets and is the sole source of information to find and retrieve data. This study employed automated quality assessments of the HealthData.gov metadata published from 2012 to 2014 to measure completeness, accuracy, and consistency of applying standards. The results demonstrated that metadata published in earlier years had lower completeness, accuracy, and consistency. Also, metadata that underwent modifications following their original creation were of higher quality. HealthData.gov did not uniformly apply Dublin Core Metadata Initiative to the metadata, which is a widely accepted metadata standard. These findings suggested that the HealthData.gov metadata suffered from quality issues, particularly related to information that wasn’t frequently updated. The results supported the need for policies to standardize metadata and contributed to the development of automated measures of metadata quality. PMID:28269883
Bridging the semantic gap in sports

NASA Astrophysics Data System (ADS)

Li, Baoxin; Errico, James; Pan, Hao; Sezan, M. Ibrahim

2003-01-01

One of the major challenges facing current media management systems and the related applications is the so-called "semantic gap" between the rich meaning that a user desires and the shallowness of the content descriptions that are automatically extracted from the media. In this paper, we address the problem of bridging this gap in the sports domain. We propose a general framework for indexing and summarizing sports broadcast programs. The framework is based on a high-level model of sports broadcast video using the concept of an event, defined according to domain-specific knowledge for different types of sports. Within this general framework, we develop automatic event detection algorithms that are based on automatic analysis of the visual and aural signals in the media. We have successfully applied the event detection algorithms to different types of sports including American football, baseball, Japanese sumo wrestling, and soccer. Event modeling and detection contribute to the reduction of the semantic gap by providing rudimentary semantic information obtained through media analysis. We further propose a novel approach, which makes use of independently generated rich textual metadata, to fill the gap completely through synchronization of the information-laden textual data with the basic event segments. An MPEG-7 compliant prototype browsing system has been implemented to demonstrate semantic retrieval and summarization of sports video.
Partnerships To Mine Unexploited Sources of Metadata.

ERIC Educational Resources Information Center

Reynolds, Regina Romano

This paper discusses the metadata created for other purposes as a potential source of bibliographic data. The first section addresses collecting metadata by means of templates, including the Nordic Metadata Project's Dublin Core Metadata Template. The second section considers potential partnerships for re-purposing metadata for bibliographic use,…
Using Multithreading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Bailey, David H. (Technical Monitor)

1998-01-01

In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes. The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the question phase of FE applications on triangular meshes, and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments on EARTH-SP2, an implementation of EARTH on the IBM SP2, with different load balancing strategies that are built into the runtime system.
A hierarchical structure for automatic meshing and adaptive FEM analysis

NASA Technical Reports Server (NTRS)

Kela, Ajay; Saxena, Mukul; Perucchio, Renato

1987-01-01

A new algorithm for generating automatically, from solid models of mechanical parts, finite element meshes that are organized as spatially addressable quaternary trees (for 2-D work) or octal trees (for 3-D work) is discussed. Because such meshes are inherently hierarchical as well as spatially addressable, they permit efficient substructuring techniques to be used for both global analysis and incremental remeshing and reanalysis. The global and incremental techniques are summarized and some results from an experimental closed loop 2-D system in which meshing, analysis, error evaluation, and remeshing and reanalysis are done automatically and adaptively are presented. The implementation of 3-D work is briefly discussed.
Visa: AN Automatic Aware and Visual Aids Mechanism for Improving the Correct Use of Geospatial Data

NASA Astrophysics Data System (ADS)

Hong, J. H.; Su, Y. T.

2016-06-01

With the fast growth of internet-based sharing mechanism and OpenGIS technology, users nowadays enjoy the luxury to quickly locate and access a variety of geospatial data for the tasks at hands. While this sharing innovation tremendously expand the possibility of application and reduce the development cost, users nevertheless have to deal with all kinds of "differences" implicitly hidden behind the acquired georesources. We argue the next generation of GIS-based environment, regardless internet-based or not, must have built-in knowledge to automatically and correctly assess the fitness of data use and present the analyzed results to users in an intuitive and meaningful way. The VISA approach proposed in this paper refer to four different types of visual aids that can be respectively used for addressing analyzed results, namely, virtual layer, informative window, symbol transformation and augmented TOC. The VISA-enabled interface works in an automatic-aware fashion, where the standardized metadata serve as the known facts about the selected geospatial resources, algorithms for analyzing the differences of temporality and quality of the geospatial resources were designed and the transformation of analyzed results into visual aids were automatically executed. It successfully presents a new way for bridging the communication gaps between systems and users. GIS has been long seen as a powerful integration tool, but its achievements would be highly restricted if it fails to provide a friendly and correct working platform.
The LTER Network Information System: Improving Data Quality and Synthesis through Community Collaboration

NASA Astrophysics Data System (ADS)

Servilla, M.; Brunt, J.

2011-12-01

Emerging in the 1980's as a U.S. National Science Foundation funded research network, the Long Term Ecological Research (LTER) Network began with six sites and with the goal of performing comparative data collection and analysis of major biotic regions of North America. Today, the LTER Network includes 26 sites located in North America, Antarctica, Puerto Rico, and French Polynesia and has contributed a corpus of over 7,000 data sets to the public domain. The diversity of LTER research has led to a wealth of scientific data derived from atmospheric to terrestrial to oceanographic to anthropogenic studies. Such diversity, however, is a contributing factor to data being published with poor or inconsistent quality or to data lacking descriptive documentation sufficient for understanding their origin or performing derivative studies. It is for these reasons that the LTER community, in collaboration with the LTER Network Office, have embarked on the development of the LTER Network Information System (NIS) - an integrative data management approach to improve the process by which quality LTER data and metadata are assembled into a central archive, thereby enabling better discovery, analysis, and synthesis of derived data products. The mission of the LTER NIS is to promote advances in collaborative and synthetic ecological science at multiple temporal and spatial scales by providing the information management and technology infrastructure to increase: ? availability and quality of data from LTER sites - by the use and support of standardized approaches to metadata management and access to data; ? timeliness and number of LTER derived data products - by creating a suite of middleware programs and workflows that make it easy to create and maintain integrated data sets derived from LTER data; and ? knowledge generated from the synthesis of LTER data - by creating standardized access and easy to use applications to discover, access, and use LTER data. The LTER NIS will utilize the Provenance Aware Synthesis Tracking Architecture (PASTA), which will provide the LTER community a metadata-driven data-flow framework to automatically harvest data from LTER research sites and make it available through a well defined software interface. We distinguish PASTA from the more generalized NIS by classifying framework components as critical and enabling cyberinfrastructure that, collectively, provide the services defined by the above mission. Data and metadata will have to pass a set of community defined quality criteria before entry into PASTA, including the use of semantic informing metadata elements and the conformance of data to their structural descriptions provided by metadata. As a result, consumers of data products from PASTA will be assured that metadata are complete and include provenance information where applicable and the data are of the highest quality. Development of the NIS is being performed through community participation. Advisory groups, called "Tiger Teams", are enlisted from the general LTER membership to provide input to the design of the NIS. Other LTER working groups contribute community-based software into the NIS; these include modules for controlled vocabularies, scientific units, and personnel. We anticipate a 2014 release of the LTER NIS.
CINERGI: Community Inventory of EarthCube Resources for Geoscience Interoperability

NASA Astrophysics Data System (ADS)

Zaslavsky, Ilya; Bermudez, Luis; Grethe, Jeffrey; Gupta, Amarnath; Hsu, Leslie; Lehnert, Kerstin; Malik, Tanu; Richard, Stephen; Valentine, David; Whitenack, Thomas

2014-05-01

Organizing geoscience data resources to support cross-disciplinary data discovery, interpretation, analysis and integration is challenging because of different information models, semantic frameworks, metadata profiles, catalogs, and services used in different geoscience domains, not to mention different research paradigms and methodologies. The central goal of CINERGI, a new project supported by the US National Science Foundation through its EarthCube Building Blocks program, is to create a methodology and assemble a large inventory of high-quality information resources capable of supporting data discovery needs of researchers in a wide range of geoscience domains. The key characteristics of the inventory are: 1) collaboration with and integration of metadata resources from a number of large data facilities; 2) reliance on international metadata and catalog service standards; 3) assessment of resource "interoperability-readiness"; 4) ability to cross-link and navigate data resources, projects, models, researcher directories, publications, usage information, etc.; 5) efficient inclusion of "long-tail" data, which are not appearing in existing domain repositories; 6) data registration at feature level where appropriate, in addition to common dataset-level registration, and 7) integration with parallel EarthCube efforts, in particular focused on EarthCube governance, information brokering, service-oriented architecture design and management of semantic information. We discuss challenges associated with accomplishing CINERGI goals, including defining the inventory scope; managing different granularity levels of resource registration; interaction with search systems of domain repositories; explicating domain semantics; metadata brokering, harvesting and pruning; managing provenance of the harvested metadata; and cross-linking resources based on the linked open data (LOD) approaches. At the higher level of the inventory, we register domain-wide resources such as domain catalogs, vocabularies, information models, data service specifications, identifier systems, and assess their conformance with international standards (such as those adopted by ISO and OGC, and used by INSPIRE) or de facto community standards using, in part, automatic validation techniques. The main level in CINERGI leverages a metadata aggregation platform (currently Geoportal Server) to organize harvested resources from multiple collections and contributed by community members during EarthCube end-user domain workshops or suggested online. The latter mechanism uses the SciCrunch toolkit originally developed within the Neuroscience Information Framework (NIF) project and now being extended to other communities. The inventory is designed to support requests such as "Find resources with theme X in geographic area S", "Find datasets with subject Y using query concept expansion", "Find geographic regions having data of type Z", "Find datasets that contain property P". With the added LOD support, additional types of requests, such as "Find example implementations of specification X", "Find researchers who have worked in Domain X, dataset Y, location L", "Find resources annotated by person X", will be supported. Project's website (http://workspace.earthcube.org/cinergi) provides access to the initial resource inventory, a gallery of EarthCube researchers, collections of geoscience models, metadata entry forms, and other software modules and inventories being integrated into the CINERGI system. Support from the US National Science Foundation under award NSF ICER-1343816 is gratefully acknowledged.
A Window to the World: Lessons Learned from NASA's Collaborative Metadata Curation Effort

NASA Astrophysics Data System (ADS)

Bugbee, K.; Dixon, V.; Baynes, K.; Shum, D.; le Roux, J.; Ramachandran, R.

2017-12-01

Well written descriptive metadata adds value to data by making data easier to discover as well as increases the use of data by providing the context or appropriateness of use. While many data centers acknowledge the importance of correct, consistent and complete metadata, allocating resources to curate existing metadata is often difficult. To lower resource costs, many data centers seek guidance on best practices for curating metadata but struggle to identify those recommendations. In order to assist data centers in curating metadata and to also develop best practices for creating and maintaining metadata, NASA has formed a collaborative effort to improve the Earth Observing System Data and Information System (EOSDIS) metadata in the Common Metadata Repository (CMR). This effort has taken significant steps in building consensus around metadata curation best practices. However, this effort has also revealed gaps in EOSDIS enterprise policies and procedures within the core metadata curation task. This presentation will explore the mechanisms used for building consensus on metadata curation, the gaps identified in policies and procedures, the lessons learned from collaborating with both the data centers and metadata curation teams, and the proposed next steps for the future.
Automating Data Submission to a National Archive

NASA Astrophysics Data System (ADS)

Work, T. T.; Chandler, C. L.; Groman, R. C.; Allison, M. D.; Gegg, S. R.; Biological; Chemical Oceanography Data Management Office

2010-12-01

In late 2006, the U.S. National Science Foundation (NSF) funded the Biological and Chemical Oceanographic Data Management Office (BCO-DMO) at Woods Hole Oceanographic Institution (WHOI) to work closely with investigators to manage oceanographic data generated from their research projects. One of the final data management tasks is to ensure that the data are permanently archived at the U.S. National Oceanographic Data Center (NODC) or other appropriate national archiving facility. In the past, BCO-DMO submitted data to NODC as an email with attachments including a PDF file (a manually completed metadata record) and one or more data files. This method is no longer feasible given the rate at which data sets are contributed to BCO-DMO. Working with collaborators at NODC, a more streamlined and automated workflow was developed to keep up with the increased volume of data that must be archived at NODC. We will describe our new workflow; a semi-automated approach for contributing data to NODC that includes a Federal Geographic Data Committee (FGDC) compliant Extensible Markup Language (XML) metadata file accompanied by comma-delimited data files. The FGDC XML file is populated from information stored in a MySQL database. A crosswalk described by an Extensible Stylesheet Language Transformation (XSLT) is used to transform the XML formatted MySQL result set to a FGDC compliant XML metadata file. To ensure data integrity, the MD5 algorithm is used to generate a checksum and manifest of the files submitted to NODC for permanent archive. The revised system supports preparation of detailed, standards-compliant metadata that facilitate data sharing and enable accurate reuse of multidisciplinary information. The approach is generic enough to be adapted for use by other data management groups.
Color Imaging management in film processing

NASA Astrophysics Data System (ADS)

Tremeau, Alain; Konik, Hubert; Colantoni, Philippe

2003-12-01

The latest research projects in the laboratory LIGIV concerns capture, processing, archiving and display of color images considering the trichromatic nature of the Human Vision System (HSV). Among these projects one addresses digital cinematographic film sequences of high resolution and dynamic range. This project aims to optimize the use of content for the post-production operators and for the end user. The studies presented in this paper address the use of metadata to optimise the consumption of video content on a device of user's choice independent of the nature of the equipment that captured the content. Optimising consumption includes enhancing the quality of image reconstruction on a display. Another part of this project addresses the content-based adaptation of image display. Main focus is on Regions of Interest (ROI) operations, based on the ROI concepts of MPEG-7. The aim of this second part is to characterize and ensure the conditions of display even if display device or display media changes. This requires firstly the definition of a reference color space and the definition of bi-directional color transformations for each peripheral device (camera, display, film recorder, etc.). The complicating factor is that different devices have different color gamuts, depending on the chromaticity of their primaries and the ambient illumination under which they are viewed. To match the displayed image to the aimed appearance, all kind of production metadata (camera specification, camera colour primaries, lighting conditions) should be associated to the film material. Metadata and content build together rich content. The author is assumed to specify conditions as known from digital graphics arts. To control image pre-processing and image post-processing, these specifications should be contained in the film's metadata. The specifications are related to the ICC profiles but need additionally consider mesopic viewing conditions.
Optimizing Input/Output Using Adaptive File System Policies

NASA Technical Reports Server (NTRS)

Madhyastha, Tara M.; Elford, Christopher L.; Reed, Daniel A.

1996-01-01

Parallel input/output characterization studies and experiments with flexible resource management algorithms indicate that adaptivity is crucial to file system performance. In this paper we propose an automatic technique for selecting and refining file system policies based on application access patterns and execution environment. An automatic classification framework allows the file system to select appropriate caching and pre-fetching policies, while performance sensors provide feedback used to tune policy parameters for specific system environments. To illustrate the potential performance improvements possible using adaptive file system policies, we present results from experiments involving classification-based and performance-based steering.
Evaluating and Evolving Metadata in Multiple Dialects

NASA Astrophysics Data System (ADS)

Kozimor, J.; Habermann, T.; Powers, L. A.; Gordon, S.

2016-12-01

Despite many long-term homogenization efforts, communities continue to develop focused metadata standards along with related recommendations and (typically) XML representations (aka dialects) for sharing metadata content. Different representations easily become obstacles to sharing information because each representation generally requires a set of tools and skills that are designed, built, and maintained specifically for that representation. In contrast, community recommendations are generally described, at least initially, at a more conceptual level and are more easily shared. For example, most communities agree that dataset titles should be included in metadata records although they write the titles in different ways. This situation has led to the development of metadata repositories that can ingest and output metadata in multiple dialects. As an operational example, the NASA Common Metadata Repository (CMR) includes three different metadata dialects (DIF, ECHO, and ISO 19115-2). These systems raise a new question for metadata providers: if I have a choice of metadata dialects, which should I use and how do I make that decision? We have developed a collection of metadata evaluation tools that can be used to evaluate metadata records in many dialects for completeness with respect to recommendations from many organizations and communities. We have applied these tools to over 8000 collection and granule metadata records in four different dialects. This large collection of identical content in multiple dialects enables us to address questions about metadata and dialect evolution and to answer those questions quantitatively. We will describe those tools and results from evaluating the NASA CMR metadata collection.
Locomotor adaptation is modulated by observing the actions of others

PubMed Central

Patel, Mitesh; Roberts, R. Edward; Riyaz, Mohammed U.; Ahmed, Maroof; Buckwell, David; Bunday, Karen; Ahmad, Hena; Kaski, Diego; Arshad, Qadeer

2015-01-01

Observing the motor actions of another person could facilitate compensatory motor behavior in the passive observer. Here we explored whether action observation alone can induce automatic locomotor adaptation in humans. To explore this possibility, we used the “broken escalator” paradigm. Conventionally this involves stepping upon a stationary sled after having previously experienced it actually moving (Moving trials). This history of motion produces a locomotor aftereffect when subsequently stepping onto a stationary sled. We found that viewing an actor perform the Moving trials was sufficient to generate a locomotor aftereffect in the observer, the size of which was significantly correlated with the size of the movement (postural sway) observed. Crucially, the effect is specific to watching the task being performed, as no motor adaptation occurs after simply viewing the sled move in isolation. These findings demonstrate that locomotor adaptation in humans can be driven purely by action observation, with the brain adapting motor plans in response to the size of the observed individual's motion. This mechanism may be mediated by a mirror neuron system that automatically adapts behavior to minimize movement errors and improve motor skills through social cues, although further neurophysiological studies are required to support this theory. These data suggest that merely observing the gait of another person in a challenging environment is sufficient to generate appropriate postural countermeasures, implying the existence of an automatic mechanism for adapting locomotor behavior. PMID:26156386
EBI metagenomics--a new resource for the analysis and archiving of metagenomic data.

PubMed

Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

2014-01-01

Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive.

SOCIB Glider toolbox: from sensor to data repository

NASA Astrophysics Data System (ADS)

Pau Beltran, Joan; Heslop, Emma; Ruiz, Simón; Troupin, Charles; Tintoré, Joaquín

2015-04-01

Nowadays in oceanography, gliders constitutes a mature, cost-effective technology for the acquisition of measurements independently of the sea state (unlike ships), providing subsurface data during sustained periods, including extreme weather events. The SOCIB glider toolbox is a set of MATLAB/Octave scripts and functions developed in order to manage the data collected by a glider fleet. They cover the main stages of the data management process, both in real-time and delayed-time modes: metadata aggregation, downloading, processing, and automatic generation of data products and figures. The toolbox is distributed under the GNU licence (http://www.gnu.org/copyleft/gpl.html) and is available at http://www.socib.es/users/glider/glider_toolbox.
EBI metagenomics—a new resource for the analysis and archiving of metagenomic data

PubMed Central

Hunter, Sarah; Corbett, Matthew; Denise, Hubert; Fraser, Matthew; Gonzalez-Beltran, Alejandra; Hunter, Christopher; Jones, Philip; Leinonen, Rasko; McAnulla, Craig; Maguire, Eamonn; Maslen, John; Mitchell, Alex; Nuka, Gift; Oisel, Arnaud; Pesseat, Sebastien; Radhakrishnan, Rajesh; Rocca-Serra, Philippe; Scheremetjew, Maxim; Sterk, Peter; Vaughan, Daniel; Cochrane, Guy; Field, Dawn; Sansone, Susanna-Assunta

2014-01-01

Metagenomics is a relatively recently established but rapidly expanding field that uses high-throughput next-generation sequencing technologies to characterize the microbial communities inhabiting different ecosystems (including oceans, lakes, soil, tundra, plants and body sites). Metagenomics brings with it a number of challenges, including the management, analysis, storage and sharing of data. In response to these challenges, we have developed a new metagenomics resource (http://www.ebi.ac.uk/metagenomics/) that allows users to easily submit raw nucleotide reads for functional and taxonomic analysis by a state-of-the-art pipeline, and have them automatically stored (together with descriptive, standards-compliant metadata) in the European Nucleotide Archive. PMID:24165880
Scientific Workflows + Provenance = Better (Meta-)Data Management

NASA Astrophysics Data System (ADS)

Ludaescher, B.; Cuevas-Vicenttín, V.; Missier, P.; Dey, S.; Kianmajd, P.; Wei, Y.; Koop, D.; Chirigati, F.; Altintas, I.; Belhajjame, K.; Bowers, S.

2013-12-01

The origin and processing history of an artifact is known as its provenance. Data provenance is an important form of metadata that explains how a particular data product came about, e.g., how and when it was derived in a computational process, which parameter settings and input data were used, etc. Provenance information provides transparency and helps to explain and interpret data products. Other common uses and applications of provenance include quality control, data curation, result debugging, and more generally, 'reproducible science'. Scientific workflow systems (e.g. Kepler, Taverna, VisTrails, and others) provide controlled environments for developing computational pipelines with built-in provenance support. Workflow results can then be explained in terms of workflow steps, parameter settings, input data, etc. using provenance that is automatically captured by the system. Scientific workflows themselves provide a user-friendly abstraction of the computational process and are thus a form of ('prospective') provenance in their own right. The full potential of provenance information is realized when combining workflow-level information (prospective provenance) with trace-level information (retrospective provenance). To this end, the DataONE Provenance Working Group (ProvWG) has developed an extension of the W3C PROV standard, called D-PROV. Whereas PROV provides a 'least common denominator' for exchanging and integrating provenance information, D-PROV adds new 'observables' that described workflow-level information (e.g., the functional steps in a pipeline), as well as workflow-specific trace-level information ( timestamps for each workflow step executed, the inputs and outputs used, etc.) Using examples, we will demonstrate how the combination of prospective and retrospective provenance provides added value in managing scientific data. The DataONE ProvWG is also developing tools based on D-PROV that allow scientists to get more mileage from provenance metadata. DataONE is a federation of member nodes that store data and metadata for discovery and access. By enriching metadata with provenance information, search and reuse of data is enhanced, and the 'social life' of data (being the product of many workflow runs, different people, etc.) is revealed. We are currently prototyping a provenance repository (PBase) to demonstrate what can be achieved with advanced provenance queries. The ProvExplorer and ProPub tools support advanced ad-hoc querying and visualization of provenance as well as customized provenance publications (e.g., to address privacy issues, or to focus provenance to relevant details). In a parallel line of work, we are exploring ways to add provenance support to widely-used scripting platforms (e.g. R and Python) and then expose that information via D-PROV.
EOS ODL Metadata On-line Viewer

NASA Astrophysics Data System (ADS)

Yang, J.; Rabi, M.; Bane, B.; Ullman, R.

2002-12-01

We have recently developed and deployed an EOS ODL metadata on-line viewer. The EOS ODL metadata viewer is a web server that takes: 1) an EOS metadata file in Object Description Language (ODL), 2) parameters, such as which metadata to view and what style of display to use, and returns an HTML or XML document displaying the requested metadata in the requested style. This tool is developed to address widespread complaints by science community that the EOS Data and Information System (EOSDIS) metadata files in ODL are difficult to read by allowing users to upload and view an ODL metadata file in different styles using a web browser. Users have the selection to view all the metadata or part of the metadata, such as Collection metadata, Granule metadata, or Unsupported Metadata. Choices of display styles include 1) Web: a mouseable display with tabs and turn-down menus, 2) Outline: Formatted and colored text, suitable for printing, 3) Generic: Simple indented text, a direct representation of the underlying ODL metadata, and 4) None: No stylesheet is applied and the XML generated by the converter is returned directly. Not all display styles are implemented for all the metadata choices. For example, Web style is only implemented for Collection and Granule metadata groups with known attribute fields, but not for Unsupported, Other, and All metadata. The overall strategy of the ODL viewer is to transform an ODL metadata file to a viewable HTML in two steps. The first step is to convert the ODL metadata file to an XML using a Java-based parser/translator called ODL2XML. The second step is to transform the XML to an HTML using stylesheets. Both operations are done on the server side. This allows a lot of flexibility in the final result, and is very portable cross-platform. Perl CGI behind the Apache web server is used to run the Java ODL2XML, and then run the results through an XSLT processor. The EOS ODL viewer can be accessed from either a PC or a Mac using Internet Explorer 5.0+ or Netscape 4.7+.
ISO 19115 Experiences in NASA's Earth Observing System (EOS) ClearingHOuse (ECHO)

NASA Astrophysics Data System (ADS)

Cechini, M. F.; Mitchell, A.

2011-12-01

Metadata is an important entity in the process of cataloging, discovering, and describing earth science data. As science research and the gathered data increases in complexity, so does the complexity and importance of descriptive metadata. To meet these growing needs, the metadata models required utilize richer and more mature metadata attributes. Categorizing, standardizing, and promulgating these metadata models to a politically, geographically, and scientifically diverse community is a difficult process. An integral component of metadata management within NASA's Earth Observing System Data and Information System (EOSDIS) is the Earth Observing System (EOS) ClearingHOuse (ECHO). ECHO is the core metadata repository for the EOSDIS data centers providing a centralized mechanism for metadata and data discovery and retrieval. ECHO has undertaken an internal restructuring to meet the changing needs of scientists, the consistent advancement in technology, and the advent of new standards such as ISO 19115. These improvements were based on the following tenets for data discovery and retrieval: + There exists a set of 'core' metadata fields recommended for data discovery. + There exists a set of users who will require the entire metadata record for advanced analysis. + There exists a set of users who will require a 'core' set metadata fields for discovery only. + There will never be a cessation of new formats or a total retirement of all old formats. + Users should be presented metadata in a consistent format of their choosing. In order to address the previously listed items, ECHO's new metadata processing paradigm utilizes the following approach: + Identify a cross-format set of 'core' metadata fields necessary for discovery. + Implement format-specific indexers to extract the 'core' metadata fields into an optimized query capability. + Archive the original metadata in its entirety for presentation to users requiring the full record. + Provide on-demand translation of 'core' metadata to any supported result format. Lessons learned by the ECHO team while implementing its new metadata approach to support usage of the ISO 19115 standard will be presented. These lessons learned highlight some discovered strengths and weaknesses in the ISO 19115 standard as it is introduced to an existing metadata processing system.
GUMP: Adapting Client/Server Messaging Protocols into Peer-to-Peer Serverless Environments

DTIC Science & Technology

2010-06-11

and other related metadata, such as message re- ceiver ID (for supporting multiple connections) and so forth. The Proxy consumes the message and uses...the underlying discovery subsystem and multicast to process the message and translate the request into behaviour suitable for the un- derlying...communication i.e. a chat. Jingle (XEP-0166) [26] is a related specification that de- fines an extension to the XMPP protocol for initiating and
Creating context for the experiment record. User-defined metadata: investigations into metadata usage in the LabTrove ELN.

PubMed

Willoughby, Cerys; Bird, Colin L; Coles, Simon J; Frey, Jeremy G

2014-12-22

The drive toward more transparency in research, the growing willingness to make data openly available, and the reuse of data to maximize the return on research investment all increase the importance of being able to find information and make links to the underlying data. The use of metadata in Electronic Laboratory Notebooks (ELNs) to curate experiment data is an essential ingredient for facilitating discovery. The University of Southampton has developed a Web browser-based ELN that enables users to add their own metadata to notebook entries. A survey of these notebooks was completed to assess user behavior and patterns of metadata usage within ELNs, while user perceptions and expectations were gathered through interviews and user-testing activities within the community. The findings indicate that while some groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users are making little attempts to use it, thereby endangering their ability to recover data in the future. A survey of patterns of metadata use in these notebooks, together with feedback from the user community, indicated that while a few groups are comfortable with metadata and are able to design a metadata structure that works effectively, many users adopt a "minimum required" approach to metadata. To investigate whether the patterns of metadata use in LabTrove were unusual, a series of surveys were undertaken to investigate metadata usage in a variety of platforms supporting user-defined metadata. These surveys also provided the opportunity to investigate whether interface designs in these other environments might inform strategies for encouraging metadata creation and more effective use of metadata in LabTrove.
Metadata squared: enhancing its usability for volunteered geographic information and the GeoWeb

USGS Publications Warehouse

Poore, Barbara S.; Wolf, Eric B.; Sui, Daniel Z.; Elwood, Sarah; Goodchild, Michael F.

2013-01-01

The Internet has brought many changes to the way geographic information is created and shared. One aspect that has not changed is metadata. Static spatial data quality descriptions were standardized in the mid-1990s and cannot accommodate the current climate of data creation where nonexperts are using mobile phones and other location-based devices on a continuous basis to contribute data to Internet mapping platforms. The usability of standard geospatial metadata is being questioned by academics and neogeographers alike. This chapter analyzes current discussions of metadata to demonstrate how the media shift that is occurring has affected requirements for metadata. Two case studies of metadata use are presented—online sharing of environmental information through a regional spatial data infrastructure in the early 2000s, and new types of metadata that are being used today in OpenStreetMap, a map of the world created entirely by volunteers. Changes in metadata requirements are examined for usability, the ease with which metadata supports coproduction of data by communities of users, how metadata enhances findability, and how the relationship between metadata and data has changed. We argue that traditional metadata associated with spatial data infrastructures is inadequate and suggest several research avenues to make this type of metadata more interactive and effective in the GeoWeb.
Evolutions in Metadata Quality

NASA Astrophysics Data System (ADS)

Gilman, J.

2016-12-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This talk will cover how we encourage metadata authors to improve the metadata through the use of integrated rubrics of metadata quality and outreach efforts. In addition we'll demonstrate Humanizers, a technique for dealing with the symptoms of metadata issues. Humanizers allow CMR administrators to identify specific metadata issues that are fixed at runtime when the data is indexed. An example Humanizer is the aliasing of processing level "Level 1" to "1" to improve consistency across collections. The CMR currently indexes 35K collections and 300M granules.
Metadata Means Communication: The Challenges of Producing Useful Metadata

NASA Astrophysics Data System (ADS)

Edwards, P. N.; Batcheller, A. L.

2010-12-01

Metadata are increasingly perceived as an important component of data sharing systems. For instance, metadata accompanying atmospheric model output may indicate the grid size, grid type, and parameter settings used in the model configuration. We conducted a case study of a data portal in the atmospheric sciences using in-depth interviews, document review, and observation. OUr analysis revealed a number of challenges in producing useful metadata. First, creating and managing metadata required considerable effort and expertise, yet responsibility for these tasks was ill-defined and diffused among many individuals, leading to errors, failure to capture metadata, and uncertainty about the quality of the primary data. Second, metadata ended up stored in many different forms and software tools, making it hard to manage versions and transfer between formats. Third, the exact meanings of metadata categories remained unsettled and misunderstood even among a small community of domain experts -- an effect we expect to be exacerbated when scientists from other disciplines wish to use these data. In practice, we found that metadata problems due to these obstacles are often overcome through informal, personal communication, such as conversations or email. We conclude that metadata serve to communicate the context of data production from the people who produce data to those who wish to use it. Thus while formal metadata systems are often public, critical elements of metadata (those embodied in informal communication) may never be recorded. Therefore, efforts to increase data sharing should include ways to facilitate inter-investigator communication. Instead of tackling metadata challenges only on the formal level, we can improve data usability for broader communities by better supporting metadata communication.
Inheritance rules for Hierarchical Metadata Based on ISO 19115

NASA Astrophysics Data System (ADS)

Zabala, A.; Masó, J.; Pons, X.

2012-04-01

Mainly, ISO19115 has been used to describe metadata for datasets and services. Furthermore, ISO19115 standard (as well as the new draft ISO19115-1) includes a conceptual model that allows to describe metadata at different levels of granularity structured in hierarchical levels, both in aggregated resources such as particularly series, datasets, and also in more disaggregated resources such as types of entities (feature type), types of attributes (attribute type), entities (feature instances) and attributes (attribute instances). In theory, to apply a complete metadata structure to all hierarchical levels of metadata, from the whole series to an individual feature attributes, is possible, but to store all metadata at all levels is completely impractical. An inheritance mechanism is needed to store each metadata and quality information at the optimum hierarchical level and to allow an ease and efficient documentation of metadata in both an Earth observation scenario such as a multi-satellite mission multiband imagery, as well as in a complex vector topographical map that includes several feature types separated in layers (e.g. administrative limits, contour lines, edification polygons, road lines, etc). Moreover, and due to the traditional split of maps in tiles due to map handling at detailed scales or due to the satellite characteristics, each of the previous thematic layers (e.g. 1:5000 roads for a country) or band (Landsat-5 TM cover of the Earth) are tiled on several parts (sheets or scenes respectively). According to hierarchy in ISO 19115, the definition of general metadata can be supplemented by spatially specific metadata that, when required, either inherits or overrides the general case (G.1.3). Annex H of this standard states that only metadata exceptions are defined at lower levels, so it is not necessary to generate the full registry of metadata for each level but to link particular values to the general value that they inherit. Conceptually the metadata registry is complete for each metadata hierarchical level, but at the implementation level most of the metadata elements are not stored at both levels but only at more generic one. This communication defines a metadata system that covers 4 levels, describes which metadata has to support series-layer inheritance and in which way, and how hierarchical levels are defined and stored. Metadata elements are classified according to the type of inheritance between products, series, tiles and the datasets. It explains the metadata elements classification and exemplifies it using core metadata elements. The communication also presents a metadata viewer and edition tool that uses the described model to propagate metadata elements and to show to the user a complete set of metadata for each level in a transparent way. This tool is integrated in the MiraMon GIS software.
The role of metadata in managing large environmental science datasets. Proceedings

DOE Office of Scientific and Technical Information (OSTI.GOV)

Melton, R.B.; DeVaney, D.M.; French, J. C.

1995-06-01

The purpose of this workshop was to bring together computer science researchers and environmental sciences data management practitioners to consider the role of metadata in managing large environmental sciences datasets. The objectives included: establishing a common definition of metadata; identifying categories of metadata; defining problems in managing metadata; and defining problems related to linking metadata with primary data.
Automatic water inventory, collecting, and dispensing unit

NASA Technical Reports Server (NTRS)

Hall, J. B., Jr.; Williams, E. F.

1972-01-01

Two cylindrical tanks with piston bladders and associated components for automatic filling and emptying use liquid inventory readout devices in control of water flow. Unit provides for adaptive water collection, storage, and dispensation in weightlessness environment.
Building Format-Agnostic Metadata Repositories

NASA Astrophysics Data System (ADS)

Cechini, M.; Pilone, D.

2010-12-01

This presentation will discuss the problems that surround persisting and discovering metadata in multiple formats; a set of tenets that must be addressed in a solution; and NASA’s Earth Observing System (EOS) ClearingHOuse’s (ECHO) proposed approach. In order to facilitate cross-discipline data analysis, Earth Scientists will potentially interact with more than one data source. The most common data discovery paradigm relies on services and/or applications facilitating the discovery and presentation of metadata. What may not be common are the formats in which the metadata are formatted. As the number of sources and datasets utilized for research increases, it becomes more likely that a researcher will encounter conflicting metadata formats. Metadata repositories, such as the EOS ClearingHOuse (ECHO), along with data centers, must identify ways to address this issue. In order to define the solution to this problem, the following tenets are identified: - There exists a set of ‘core’ metadata fields recommended for data discovery. - There exists a set of users who will require the entire metadata record for advanced analysis. - There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. - There will never be a cessation of new formats or a total retirement of all old formats. - Users should be presented metadata in a consistent format. ECHO has undertaken an effort to transform its metadata ingest and discovery services in order to support the growing set of metadata formats. In order to address the previously listed items, ECHO’s new metadata processing paradigm utilizes the following approach: - Identify a cross-format set of ‘core’ metadata fields necessary for discovery. - Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. - Archive the original metadata in its entirety for presentation to users requiring the full record. - Provide on-demand translation of ‘core’ metadata to any supported result format. With this identified approach, the Earth Scientist is provided with a consistent data representation as they interact with a variety of datasets that utilize multiple metadata formats. They are then able to focus their efforts on the more critical research activities which they are undertaking.
Mining and Utilizing Dataset Relevancy from Oceanographic Dataset (MUDROD) Metadata, Usage Metrics, and User Feedback to Improve Data Discovery and Access

NASA Astrophysics Data System (ADS)

Li, Y.; Jiang, Y.; Yang, C. P.; Armstrong, E. M.; Huang, T.; Moroni, D. F.; McGibbney, L. J.

2016-12-01

Big oceanographic data have been produced, archived and made available online, but finding the right data for scientific research and application development is still a significant challenge. A long-standing problem in data discovery is how to find the interrelationships between keywords and data, as well as the intrarelationships of the two individually. Most previous research attempted to solve this problem by building domain-specific ontology either manually or through automatic machine learning techniques. The former is costly, labor intensive and hard to keep up-to-date, while the latter is prone to noise and may be difficult for human to understand. Large-scale user behavior data modelling represents a largely untapped, unique, and valuable source for discovering semantic relationships among domain-specific vocabulary. In this article, we propose a search engine framework for mining and utilizing dataset relevancy from oceanographic dataset metadata, user behaviors, and existing ontology. The objective is to improve discovery accuracy of oceanographic data and reduce time for scientist to discover, download and reformat data for their projects. Experiments and a search example show that the proposed search engine helps both scientists and general users search with better ranking results, recommendation, and ontology navigation.
Geopositioning with a quadcopter: Extracted feature locations and predicted accuracy without a priori sensor attitude information

NASA Astrophysics Data System (ADS)

Dolloff, John; Hottel, Bryant; Edwards, David; Theiss, Henry; Braun, Aaron

2017-05-01

This paper presents an overview of the Full Motion Video-Geopositioning Test Bed (FMV-GTB) developed to investigate algorithm performance and issues related to the registration of motion imagery and subsequent extraction of feature locations along with predicted accuracy. A case study is included corresponding to a video taken from a quadcopter. Registration of the corresponding video frames is performed without the benefit of a priori sensor attitude (pointing) information. In particular, tie points are automatically measured between adjacent frames using standard optical flow matching techniques from computer vision, an a priori estimate of sensor attitude is then computed based on supplied GPS sensor positions contained in the video metadata and a photogrammetric/search-based structure from motion algorithm, and then a Weighted Least Squares adjustment of all a priori metadata across the frames is performed. Extraction of absolute 3D feature locations, including their predicted accuracy based on the principles of rigorous error propagation, is then performed using a subset of the registered frames. Results are compared to known locations (check points) over a test site. Throughout this entire process, no external control information (e.g. surveyed points) is used other than for evaluation of solution errors and corresponding accuracy.
In search of meta-knowledge

NASA Technical Reports Server (NTRS)

Lopez, Antonio M., Jr.

1993-01-01

Development of an Intelligent Information System (IIS) involves application of numerous artificial intelligence (AI) paradigms and advanced technologies. The National Aeronautics and Space Administration (NASA) is interested in an IIS that can automatically collect, classify, store and retrieve data, as well as develop, manipulate and restructure knowledge regarding the data and its application (Campbell et al., 1987, p.3). This interest stems in part from a NASA initiative in support of the interagency Global Change Research program. NASA's space data problems are so large and varied that scientific researchers will find it almost impossible to access the most suitable information from a software system if meta-information (metadata and meta-knowledge) is not embedded in that system. Even if more, faster, larger hardware is used, new innovative software systems will be required to organize, link, maintain, and properly archive the Earth Observing System (EOS) data that is to be stored and distributed by the EOS Data and Information System (EOSDIS) (Dozier, 1990). Although efforts are being made to specify the metadata that will be used in EOSDIS, meta-knowledge specification issues are not clear. With the expectation that EOSDIS might evolve into an IIS, this paper presents certain ideas on the concept of meta-knowledge and demonstrates how meta-knowledge might be represented in a pixel classification problem.
Making Metadata Better with CMR and MMT

NASA Technical Reports Server (NTRS)

Gilman, Jason Arthur; Shum, Dana

2016-01-01

Ensuring complete, consistent and high quality metadata is a challenge for metadata providers and curators. The CMR and MMT systems provide providers and curators options to build in metadata quality from the start and also assess and improve the quality of already existing metadata.
Evolution in Metadata Quality: Common Metadata Repository's Role in NASA Curation Efforts

NASA Technical Reports Server (NTRS)

Gilman, Jason; Shum, Dana; Baynes, Katie

2016-01-01

Metadata Quality is one of the chief drivers of discovery and use of NASA EOSDIS (Earth Observing System Data and Information System) data. Issues with metadata such as lack of completeness, inconsistency, and use of legacy terms directly hinder data use. As the central metadata repository for NASA Earth Science data, the Common Metadata Repository (CMR) has a responsibility to its users to ensure the quality of CMR search results. This poster covers how we use humanizers, a technique for dealing with the symptoms of metadata issues, as well as our plans for future metadata validation enhancements. The CMR currently indexes 35K collections and 300M granules.
Describing environmental public health data: implementing a descriptive metadata standard on the environmental public health tracking network.

PubMed

Patridge, Jeff; Namulanda, Gonza

2008-01-01

The Environmental Public Health Tracking (EPHT) Network provides an opportunity to bring together diverse environmental and health effects data by integrating}?> local, state, and national databases of environmental hazards, environmental exposures, and health effects. To help users locate data on the EPHT Network, the network will utilize descriptive metadata that provide critical information as to the purpose, location, content, and source of these data. Since 2003, the Centers for Disease Control and Prevention's EPHT Metadata Subgroup has been working to initiate the creation and use of descriptive metadata. Efforts undertaken by the group include the adoption of a metadata standard, creation of an EPHT-specific metadata profile, development of an open-source metadata creation tool, and promotion of the creation of descriptive metadata by changing the perception of metadata in the public health culture.

PH5: HDF5 Based Format for Integrating and Archiving Seismic Data

NASA Astrophysics Data System (ADS)

Hess, D.; Azevedo, S.; Falco, N.; Beaudoin, B. C.

2017-12-01

PH5 is a seismic data format created by IRIS PASSCAL using HDF5. Building PH5 on HDF5 allows for portability and extensibility on a scale that is unavailable in older seismic data formats. PH5 is designed to evolve to accept new data types as they become available in the future and to operate on a variety of platforms (i.e. Mac, Linux, Windows). Exemplifying PH5's flexibility is the evolution from just handling active source seismic data to now including passive source, onshore-offshore, OBS and mixed source seismic data sets. In PH5, metadata is separated from the time series data and stored in a size and performance efficient manner that also allows for easy user interaction and output of the metadata in a format appropriate for the data set. PH5's full-fledged "Kitchen Software Suite" comprises tools for data ingestion (e.g. RefTek, SEG-Y, SEG-D, SEG-2, MSEED), meta-data management, QC, waveform viewing, and data output. This software suite not only includes command line and GUI tools for interacting with PH5, it is also a comprehensive Python package to support the creation of software tools by the community to further enhance PH5. The PH5 software suite is currently being used in multiple capacities, including in-field for creating archive ready data sets as well as by the IRIS Data Management Center (DMC) to offer an FDSN compliant set of web services for serving PH5 data to the community in a variety of standard data and meta-data formats (i.e. StationXML, QuakeML, EventXML, SAC + Poles and Zeroes, MiniSEED, and SEG-Y) as well as StationTXT and ShotText formats. These web services can be accessed via standard FDSN clients such as ObsPy, irisFetch.m, FetchData, and FetchMetadata. This presentation will highlight and demonstrate the benefits of PH5 as a next generation adaptable and extensible data format for use in both archiving and working with seismic data.
maxdLoad2 and maxdBrowse: standards-compliant tools for microarray experimental annotation, data management and dissemination.

PubMed

Hancock, David; Wilson, Michael; Velarde, Giles; Morrison, Norman; Hayes, Andrew; Hulme, Helen; Wood, A Joseph; Nashar, Karim; Kell, Douglas B; Brass, Andy

2005-11-03

maxdLoad2 is a relational database schema and Java application for microarray experimental annotation and storage. It is compliant with all standards for microarray meta-data capture; including the specification of what data should be recorded, extensive use of standard ontologies and support for data exchange formats. The output from maxdLoad2 is of a form acceptable for submission to the ArrayExpress microarray repository at the European Bioinformatics Institute. maxdBrowse is a PHP web-application that makes contents of maxdLoad2 databases accessible via web-browser, the command-line and web-service environments. It thus acts as both a dissemination and data-mining tool. maxdLoad2 presents an easy-to-use interface to an underlying relational database and provides a full complement of facilities for browsing, searching and editing. There is a tree-based visualization of data connectivity and the ability to explore the links between any pair of data elements, irrespective of how many intermediate links lie between them. Its principle novel features are: the flexibility of the meta-data that can be captured, the tools provided for importing data from spreadsheets and other tabular representations, the tools provided for the automatic creation of structured documents, the ability to browse and access the data via web and web-services interfaces. Within maxdLoad2 it is very straightforward to customise the meta-data that is being captured or change the definitions of the meta-data. These meta-data definitions are stored within the database itself allowing client software to connect properly to a modified database without having to be specially configured. The meta-data definitions (configuration file) can also be centralized allowing changes made in response to revisions of standards or terminologies to be propagated to clients without user intervention.maxdBrowse is hosted on a web-server and presents multiple interfaces to the contents of maxd databases. maxdBrowse emulates many of the browse and search features available in the maxdLoad2 application via a web-browser. This allows users who are not familiar with maxdLoad2 to browse and export microarray data from the database for their own analysis. The same browse and search features are also available via command-line and SOAP server interfaces. This both enables scripting of data export for use embedded in data repositories and analysis environments, and allows access to the maxd databases via web-service architectures. maxdLoad2 http://www.bioinf.man.ac.uk/microarray/maxd/ and maxdBrowse http://dbk.ch.umist.ac.uk/maxdBrowse are portable and compatible with all common operating systems and major database servers. They provide a powerful, flexible package for annotation of microarray experiments and a convenient dissemination environment. They are available for download and open sourced under the Artistic License.
DataUp: Helping manage and archive data within the researcher's workflow

NASA Astrophysics Data System (ADS)

Strasser, C.

2012-12-01

There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are lacks of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. We have developed an open-source add-in for Excel and an open source web application intended to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. The researcher does not need a prior relationship with a data repository to use DataUp; the newly implemented ONEShare repository, a DataONE member node, is available for any researcher to archive and share their data. By meeting researchers where they already work, in spreadsheets, DataUp becomes part of the researcher's workflow and data management and sharing becomes easier. Future enhancement of DataUp will rely on members of the community adopting and adapting the DataUp tools to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between Microsoft Research Connections, the University of California's California Digital Library, the Gordon and Betty Moore Foundation, and DataONE.
Metadata: Standards for Retrieving WWW Documents (and Other Digitized and Non-Digitized Resources)

NASA Astrophysics Data System (ADS)

Rusch-Feja, Diann

The use of metadata for indexing digitized and non-digitized resources for resource discovery in a networked environment is being increasingly implemented all over the world. Greater precision is achieved using metadata than relying on universal search engines and furthermore, meta-data can be used as filtering mechanisms for search results. An overview of various metadata sets is given, followed by a more focussed presentation of Dublin Core Metadata including examples of sub-elements and qualifiers. Especially the use of the Dublin Core Relation element provides connections between the metadata of various related electronic resources, as well as the metadata for physical, non-digitized resources. This facilitates more comprehensive search results without losing precision and brings together different genres of information which would otherwise be only searchable in separate databases. Furthermore, the advantages of Dublin Core Metadata in comparison with library cataloging and the use of universal search engines are discussed briefly, followed by a listing of types of implementation of Dublin Core Metadata.
Department of the Interior metadata implementation guide—Framework for developing the metadata component for data resource management

USGS Publications Warehouse

Obuch, Raymond C.; Carlino, Jennifer; Zhang, Lin; Blythe, Jonathan; Dietrich, Christopher; Hawkinson, Christine

2018-04-12

The Department of the Interior (DOI) is a Federal agency with over 90,000 employees across 10 bureaus and 8 agency offices. Its primary mission is to protect and manage the Nation’s natural resources and cultural heritage; provide scientific and other information about those resources; and honor its trust responsibilities or special commitments to American Indians, Alaska Natives, and affiliated island communities. Data and information are critical in day-to-day operational decision making and scientific research. DOI is committed to creating, documenting, managing, and sharing high-quality data and metadata in and across its various programs that support its mission. Documenting data through metadata is essential in realizing the value of data as an enterprise asset. The completeness, consistency, and timeliness of metadata affect users’ ability to search for and discover the most relevant data for the intended purpose; and facilitates the interoperability and usability of these data among DOI bureaus and offices. Fully documented metadata describe data usability, quality, accuracy, provenance, and meaning.Across DOI, there are different maturity levels and phases of information and metadata management implementations. The Department has organized a committee consisting of bureau-level points-of-contacts to collaborate on the development of more consistent, standardized, and more effective metadata management practices and guidance to support this shared mission and the information needs of the Department. DOI’s metadata implementation plans establish key roles and responsibilities associated with metadata management processes, procedures, and a series of actions defined in three major metadata implementation phases including: (1) Getting started—Planning Phase, (2) Implementing and Maintaining Operational Metadata Management Phase, and (3) the Next Steps towards Improving Metadata Management Phase. DOI’s phased approach for metadata management addresses some of the major data and metadata management challenges that exist across the diverse missions of the bureaus and offices. All employees who create, modify, or use data are involved with data and metadata management. Identifying, establishing, and formalizing the roles and responsibilities associated with metadata management are key to institutionalizing a framework of best practices, methodologies, processes, and common approaches throughout all levels of the organization; these are the foundation for effective data resource management. For executives and managers, metadata management strengthens their overarching views of data assets, holdings, and data interoperability; and clarifies how metadata management can help accelerate the compliance of multiple policy mandates. For employees, data stewards, and data professionals, formalized metadata management will help with the consistency of definitions, and approaches addressing data discoverability, data quality, and data lineage. In addition to data professionals and others associated with information technology; data stewards and program subject matter experts take on important metadata management roles and responsibilities as data flow through their respective business and science-related workflows. The responsibilities of establishing, practicing, and governing the actions associated with their specific metadata management roles are critical to successful metadata implementation.
Data dissemination using gossiping in wireless sensor networks

NASA Astrophysics Data System (ADS)

Medidi, Muralidhar; Ding, Jin; Medidi, Sirisha

2005-06-01

Disseminating data among sensors is a fundamental operation in energy-constrained wireless sensor networks. We present a gossip-based adaptive protocol for data dissemination to improve energy efficiency of this operation. To overcome the data implosion problems associated with dissemination operation, our protocol uses meta-data to name the data using high-level data descriptors and negotiation to eliminate redundant transmissions of duplicate data in the network. Further, we adapt the gossiping with data aggregation possibilities in sensor networks. We simulated our data dissemination protocol, and compared it to the SPIN protocol. We find that our protocol improves on the energy consumption by about 20% over others, while improving significantly over the data dissemination rate of gossiping.
Making Interoperability Easier with the NASA Metadata Management Tool

NASA Astrophysics Data System (ADS)

Shum, D.; Reese, M.; Pilone, D.; Mitchell, A. E.

2016-12-01

ISO 19115 has enabled interoperability amongst tools, yet many users find it hard to build ISO metadata for their collections because it can be large and overly flexible for their needs. The Metadata Management Tool (MMT), part of NASA's Earth Observing System Data and Information System (EOSDIS), offers users a modern, easy to use browser based tool to develop ISO compliant metadata. Through a simplified UI experience, metadata curators can create and edit collections without any understanding of the complex ISO-19115 format, while still generating compliant metadata. The MMT is also able to assess the completeness of collection level metadata by evaluating it against a variety of metadata standards. The tool provides users with clear guidance as to how to change their metadata in order to improve their quality and compliance. It is based on NASA's Unified Metadata Model for Collections (UMM-C) which is a simpler metadata model which can be cleanly mapped to ISO 19115. This allows metadata authors and curators to meet ISO compliance requirements faster and more accurately. The MMT and UMM-C have been developed in an agile fashion, with recurring end user tests and reviews to continually refine the tool, the model and the ISO mappings. This process is allowing for continual improvement and evolution to meet the community's needs.
Hybrid polylingual object model: an efficient and seamless integration of Java and native components on the Dalvik virtual machine.

PubMed

Huang, Yukun; Chen, Rong; Wei, Jingbo; Pei, Xilong; Cao, Jing; Prakash Jayaraman, Prem; Ranjan, Rajiv

2014-01-01

JNI in the Android platform is often observed with low efficiency and high coding complexity. Although many researchers have investigated the JNI mechanism, few of them solve the efficiency and the complexity problems of JNI in the Android platform simultaneously. In this paper, a hybrid polylingual object (HPO) model is proposed to allow a CAR object being accessed as a Java object and as vice in the Dalvik virtual machine. It is an acceptable substitute for JNI to reuse the CAR-compliant components in Android applications in a seamless and efficient way. The metadata injection mechanism is designed to support the automatic mapping and reflection between CAR objects and Java objects. A prototype virtual machine, called HPO-Dalvik, is implemented by extending the Dalvik virtual machine to support the HPO model. Lifespan management, garbage collection, and data type transformation of HPO objects are also handled in the HPO-Dalvik virtual machine automatically. The experimental result shows that the HPO model outweighs the standard JNI in lower overhead on native side, better executing performance with no JNI bridging code being demanded.
Automatic multi-label annotation of abdominal CT images using CBIR

NASA Astrophysics Data System (ADS)

Xue, Zhiyun; Antani, Sameer; Long, L. Rodney; Thoma, George R.

2017-03-01

We present a technique to annotate multiple organs shown in 2-D abdominal/pelvic CT images using CBIR. This annotation task is motivated by our research interests in visual question-answering (VQA). We aim to apply results from this effort in Open-iSM, a multimodal biomedical search engine developed by the National Library of Medicine (NLM). Understanding visual content of biomedical images is a necessary step for VQA. Though sufficient annotational information about an image may be available in related textual metadata, not all may be useful as descriptive tags, particularly for anatomy on the image. In this paper, we develop and evaluate a multi-label image annotation method using CBIR. We evaluate our method on two 2-D CT image datasets we generated from 3-D volumetric data obtained from a multi-organ segmentation challenge hosted in MICCAI 2015. Shape and spatial layout information is used to encode visual characteristics of the anatomy. We adapt a weighted voting scheme to assign multiple labels to the query image by combining the labels of the images identified as similar by the method. Key parameters that may affect the annotation performance, such as the number of images used in the label voting and the threshold for excluding labels that have low weights, are studied. The method proposes a coarse-to-fine retrieval strategy which integrates the classification with the nearest-neighbor search. Results from our evaluation (using the MICCAI CT image datasets as well as figures from Open-i) are presented.
BreakingNews: Article Annotation by Image and Text Processing.

PubMed

Ramisa, Arnau; Yan, Fei; Moreno-Noguer, Francesc; Mikolajczyk, Krystian

2018-05-01

Building upon recent Deep Neural Network architectures, current approaches lying in the intersection of Computer Vision and Natural Language Processing have achieved unprecedented breakthroughs in tasks like automatic captioning or image retrieval. Most of these learning methods, though, rely on large training sets of images associated with human annotations that specifically describe the visual content. In this paper we propose to go a step further and explore the more complex cases where textual descriptions are loosely related to the images. We focus on the particular domain of news articles in which the textual content often expresses connotative and ambiguous relations that are only suggested but not directly inferred from images. We introduce an adaptive CNN architecture that shares most of the structure for multiple tasks including source detection, article illustration and geolocation of articles. Deep Canonical Correlation Analysis is deployed for article illustration, and a new loss function based on Great Circle Distance is proposed for geolocation. Furthermore, we present BreakingNews, a novel dataset with approximately 100K news articles including images, text and captions, and enriched with heterogeneous meta-data (such as GPS coordinates and user comments). We show this dataset to be appropriate to explore all aforementioned problems, for which we provide a baseline performance using various Deep Learning architectures, and different representations of the textual and visual features. We report very promising results and bring to light several limitations of current state-of-the-art in this kind of domain, which we hope will help spur progress in the field.
GraphMeta: Managing HPC Rich Metadata in Graphs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dai, Dong; Chen, Yong; Carns, Philip

High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes, but also from increasingly diverse metadata, which contains data provenance and arbitrary user-defined attributes in addition to traditional POSIX metadata. This ‘rich’ metadata is becoming critical to supporting advanced data management functionality such as data auditing and validation. In our prior work, we identified a graph-based model as a promising solution to uniformly manage HPC rich metadata due to its flexibility and generality. However, at the same time, graph-based HPC rich metadata anagement also introducesmore » significant challenges to the underlying infrastructure. In this study, we first identify the challenges on the underlying infrastructure to support scalable, high-performance rich metadata management. Based on that, we introduce GraphMeta, a graphbased engine designed for this use case. It achieves performance scalability by introducing a new graph partitioning algorithm and a write-optimal storage engine. We evaluate GraphMeta under both synthetic and real HPC metadata workloads, compare it with other approaches, and demonstrate its advantages in terms of efficiency and usability for rich metadata management in HPC systems.« less
Adaptive Data Gathering in Mobile Sensor Networks Using Speedy Mobile Elements

PubMed Central

Lai, Yongxuan; Xie, Jinshan; Lin, Ziyu; Wang, Tian; Liao, Minghong

2015-01-01

Data gathering is a key operator for applications in wireless sensor networks; yet it is also a challenging problem in mobile sensor networks when considering that all nodes are mobile and the communications among them are opportunistic. This paper proposes an efficient data gathering scheme called ADG that adopts speedy mobile elements as the mobile data collector and takes advantage of the movement patterns of the network. ADG first extracts the network meta-data at initial epochs, and calculates a set of proxy nodes based on the meta-data. Data gathering is then mapped into the Proxy node Time Slot Allocation (PTSA) problem that schedules the time slots and orders, according to which the data collector could gather the maximal amount of data within a limited period. Finally, the collector follows the schedule and picks up the sensed data from the proxy nodes through one hop of message transmissions. ADG learns the period when nodes are relatively stationary, so that the collector is able to pick up the data from them during the limited data gathering period. Moreover, proxy nodes and data gathering points could also be timely updated so that the collector could adapt to the change of node movements. Extensive experimental results show that the proposed scheme outperforms other data gathering schemes on the cost of message transmissions and the data gathering rate, especially under the constraint of limited data gathering period. PMID:26389903
MINC 2.0: A Flexible Format for Multi-Modal Images.

PubMed

Vincent, Robert D; Neelin, Peter; Khalili-Mahani, Najmeh; Janke, Andrew L; Fonov, Vladimir S; Robbins, Steven M; Baghdadi, Leila; Lerch, Jason; Sled, John G; Adalat, Reza; MacDonald, David; Zijdenbos, Alex P; Collins, D Louis; Evans, Alan C

2016-01-01

It is often useful that an imaging data format can afford rich metadata, be flexible, scale to very large file sizes, support multi-modal data, and have strong inbuilt mechanisms for data provenance. Beginning in 1992, MINC was developed as a system for flexible, self-documenting representation of neuroscientific imaging data with arbitrary orientation and dimensionality. The MINC system incorporates three broad components: a file format specification, a programming library, and a growing set of tools. In the early 2000's the MINC developers created MINC 2.0, which added support for 64-bit file sizes, internal compression, and a number of other modern features. Because of its extensible design, it has been easy to incorporate details of provenance in the header metadata, including an explicit processing history, unique identifiers, and vendor-specific scanner settings. This makes MINC ideal for use in large scale imaging studies and databases. It also makes it easy to adapt to new scanning sequences and modalities.
Metabolonote: A Wiki-Based Database for Managing Hierarchical Metadata of Metabolome Analyses

PubMed Central

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics – technology for comprehensive detection of small molecules in an organism – lags behind the other “omics” in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called “Togo Metabolome Data” (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers’ understanding and use of data but also submitters’ motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/. PMID:25905099
Metabolonote: a wiki-based database for managing hierarchical metadata of metabolome analyses.

PubMed

Ara, Takeshi; Enomoto, Mitsuo; Arita, Masanori; Ikeda, Chiaki; Kera, Kota; Yamada, Manabu; Nishioka, Takaaki; Ikeda, Tasuku; Nihei, Yoshito; Shibata, Daisuke; Kanaya, Shigehiko; Sakurai, Nozomu

2015-01-01

Metabolomics - technology for comprehensive detection of small molecules in an organism - lags behind the other "omics" in terms of publication and dissemination of experimental data. Among the reasons for this are difficulty precisely recording information about complicated analytical experiments (metadata), existence of various databases with their own metadata descriptions, and low reusability of the published data, resulting in submitters (the researchers who generate the data) being insufficiently motivated. To tackle these issues, we developed Metabolonote, a Semantic MediaWiki-based database designed specifically for managing metabolomic metadata. We also defined a metadata and data description format, called "Togo Metabolome Data" (TogoMD), with an ID system that is required for unique access to each level of the tree-structured metadata such as study purpose, sample, analytical method, and data analysis. Separation of the management of metadata from that of data and permission to attach related information to the metadata provide advantages for submitters, readers, and database developers. The metadata are enriched with information such as links to comparable data, thereby functioning as a hub of related data resources. They also enhance not only readers' understanding and use of data but also submitters' motivation to publish the data. The metadata are computationally shared among other systems via APIs, which facilitate the construction of novel databases by database developers. A permission system that allows publication of immature metadata and feedback from readers also helps submitters to improve their metadata. Hence, this aspect of Metabolonote, as a metadata preparation tool, is complementary to high-quality and persistent data repositories such as MetaboLights. A total of 808 metadata for analyzed data obtained from 35 biological species are published currently. Metabolonote and related tools are available free of cost at http://metabolonote.kazusa.or.jp/.
Method and apparatus for telemetry adaptive bandwidth compression

NASA Technical Reports Server (NTRS)

Graham, Olin L.

1987-01-01

Methods and apparatus are provided for automatic and/or manual adaptive bandwidth compression of telemetry. An adaptive sampler samples a video signal from a scanning sensor and generates a sequence of sampled fields. Each field and range rate information from the sensor are hence sequentially transmitted to and stored in a multiple and adaptive field storage means. The field storage means then, in response to an automatic or manual control signal, transfers the stored sampled field signals to a video monitor in a form for sequential or simultaneous display of a desired number of stored signal fields. The sampling ratio of the adaptive sample, the relative proportion of available communication bandwidth allocated respectively to transmitted data and video information, and the number of fields simultaneously displayed are manually or automatically selectively adjustable in functional relationship to each other and detected range rate. In one embodiment, when relatively little or no scene motion is detected, the control signal maximizes sampling ratio and causes simultaneous display of all stored fields, thus maximizing resolution and bandwidth available for data transmission. When increased scene motion is detected, the control signal is adjusted accordingly to cause display of fewer fields. If greater resolution is desired, the control signal is adjusted to increase the sampling ratio.
Metadata (MD)

Treesearch

Robert E. Keane

2006-01-01

The Metadata (MD) table in the FIREMON database is used to record any information about the sampling strategy or data collected using the FIREMON sampling procedures. The MD method records metadata pertaining to a group of FIREMON plots, such as all plots in a specific FIREMON project. FIREMON plots are linked to metadata using a unique metadata identifier that is...
Adaptive Intelligent Support to Improve Peer Tutoring in Algebra

ERIC Educational Resources Information Center

Walker, Erin; Rummel, Nikol; Koedinger, Kenneth R.

2014-01-01

Adaptive collaborative learning support (ACLS) involves collaborative learning environments that adapt their characteristics, and sometimes provide intelligent hints and feedback, to improve individual students' collaborative interactions. ACLS often involves a system that can automatically assess student dialogue, model effective and…
Dynamic Non-Hierarchical File Systems for Exascale Storage

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, Darrell E.; Miller, Ethan L

This constitutes the final report for “Dynamic Non-Hierarchical File Systems for Exascale Storage”. The ultimate goal of this project was to improve data management in scientific computing and high-end computing (HEC) applications, and to achieve this goal we proposed: to develop the first, HEC-targeted, file system featuring rich metadata and provenance collection, extreme scalability, and future storage hardware integration as core design goals, and to evaluate and develop a flexible non-hierarchical file system interface suitable for providing more powerful and intuitive data management interfaces to HEC and scientific computing users. Data management is swiftly becoming a serious problem in themore » scientific community – while copious amounts of data are good for obtaining results, finding the right data is often daunting and sometimes impossible. Scientists participating in a Department of Energy workshop noted that most of their time was spent “...finding, processing, organizing, and moving data and it’s going to get much worse”. Scientists should not be forced to become data mining experts in order to retrieve the data they want, nor should they be expected to remember the naming convention they used several years ago for a set of experiments they now wish to revisit. Ideally, locating the data you need would be as easy as browsing the web. Unfortunately, existing data management approaches are usually based on hierarchical naming, a 40 year-old technology designed to manage thousands of files, not exabytes of data. Today’s systems do not take advantage of the rich array of metadata that current high-end computing (HEC) file systems can gather, including content-based metadata and provenance1 information. As a result, current metadata search approaches are typically ad hoc and often work by providing a parallel management system to the “main” file system, as is done in Linux (the locate utility), personal computers, and enterprise search appliances. These search applications are often optimized for a single file system, making it difficult to move files and their metadata between file systems. Users have tried to solve this problem in several ways, including the use of separate databases to index file properties, the encoding of file properties into file names, and separately gathering and managing provenance data, but none of these approaches has worked well, either due to limited usefulness or scalability, or both. Our research addressed several key issues: High-performance, real-time metadata harvesting: extracting important attributes from files dynamically and immediately updating indexes used to improve search; Transparent, automatic, and secure provenance capture: recording the data inputs and processing steps used in the production of each file in the system; Scalable indexing: indexes that are optimized for integration with the file system; Dynamic file system structure: our approach provides dynamic directories similar to those in semantic file systems, but these are the native organization rather than a feature grafted onto a conventional system. In addition to these goals, our research effort will include evaluating the impact of new storage technologies on the file system design and performance. In particular, the indexing and metadata harvesting functions can potentially benefit from the performance improvements promised by new storage class memories.« less
Approaches to the automatic generation and control of finite element meshes

NASA Technical Reports Server (NTRS)

Shephard, Mark S.

1987-01-01

The algorithmic approaches being taken to the development of finite element mesh generators capable of automatically discretizing general domains without the need for user intervention are discussed. It is demonstrated that because of the modeling demands placed on a automatic mesh generator, all the approaches taken to date produce unstructured meshes. Consideration is also given to both a priori and a posteriori mesh control devices for automatic mesh generators as well as their integration with geometric modeling and adaptive analysis procedures.

Documentation Resources on the ESIP Wiki

NASA Technical Reports Server (NTRS)

Habermann, Ted; Kozimor, John; Gordon, Sean

2017-01-01

The ESIP community includes data providers and users that communicate with one another through datasets and metadata that describe them. Improving this communication depends on consistent high-quality metadata. The ESIP Documentation Cluster and the wiki play an important central role in facilitating this communication. We will describe and demonstrate sections of the wiki that provide information about metadata concept definitions, metadata recommendation, metadata dialects, and guidance pages. We will also describe and demonstrate the ISO Explorer, a tool that the community is developing to help metadata creators.
Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format.

PubMed

Ismail, Mahmoud; Philbin, James

2015-04-01

The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies' metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata.
Transforming Dermatologic Imaging for the Digital Era: Metadata and Standards.

PubMed

Caffery, Liam J; Clunie, David; Curiel-Lewandrowski, Clara; Malvehy, Josep; Soyer, H Peter; Halpern, Allan C

2018-01-17

Imaging is increasingly being used in dermatology for documentation, diagnosis, and management of cutaneous disease. The lack of standards for dermatologic imaging is an impediment to clinical uptake. Standardization can occur in image acquisition, terminology, interoperability, and metadata. This paper presents the International Skin Imaging Collaboration position on standardization of metadata for dermatologic imaging. Metadata is essential to ensure that dermatologic images are properly managed and interpreted. There are two standards-based approaches to recording and storing metadata in dermatologic imaging. The first uses standard consumer image file formats, and the second is the file format and metadata model developed for the Digital Imaging and Communication in Medicine (DICOM) standard. DICOM would appear to provide an advantage over using consumer image file formats for metadata as it includes all the patient, study, and technical metadata necessary to use images clinically. Whereas, consumer image file formats only include technical metadata and need to be used in conjunction with another actor-for example, an electronic medical record-to supply the patient and study metadata. The use of DICOM may have some ancillary benefits in dermatologic imaging including leveraging DICOM network and workflow services, interoperability of images and metadata, leveraging existing enterprise imaging infrastructure, greater patient safety, and better compliance to legislative requirements for image retention.
Fast processing of digital imaging and communications in medicine (DICOM) metadata using multiseries DICOM format

PubMed Central

Ismail, Mahmoud; Philbin, James

2015-01-01

Abstract. The digital imaging and communications in medicine (DICOM) information model combines pixel data and its metadata in a single object. There are user scenarios that only need metadata manipulation, such as deidentification and study migration. Most picture archiving and communication system use a database to store and update the metadata rather than updating the raw DICOM files themselves. The multiseries DICOM (MSD) format separates metadata from pixel data and eliminates duplicate attributes. This work promotes storing DICOM studies in MSD format to reduce the metadata processing time. A set of experiments are performed that update the metadata of a set of DICOM studies for deidentification and migration. The studies are stored in both the traditional single frame DICOM (SFD) format and the MSD format. The results show that it is faster to update studies’ metadata in MSD format than in SFD format because the bulk data is separated in MSD and is not retrieved from the storage system. In addition, it is space efficient to store the deidentified studies in MSD format as it shares the same bulk data object with the original study. In summary, separation of metadata from pixel data using the MSD format provides fast metadata access and speeds up applications that process only the metadata. PMID:26158117
ISO, FGDC, DIF and Dublin Core - Making Sense of Metadata Standards for Earth Science Data

NASA Astrophysics Data System (ADS)

Jones, P. R.; Ritchey, N. A.; Peng, G.; Toner, V. A.; Brown, H.

2014-12-01

Metadata standards provide common definitions of metadata fields for information exchange across user communities. Despite the broad adoption of metadata standards for Earth science data, there are still heterogeneous and incompatible representations of information due to differences between the many standards in use and how each standard is applied. Federal agencies are required to manage and publish metadata in different metadata standards and formats for various data catalogs. In 2014, the NOAA National Climatic data Center (NCDC) managed metadata for its scientific datasets in ISO 19115-2 in XML, GCMD Directory Interchange Format (DIF) in XML, DataCite Schema in XML, Dublin Core in XML, and Data Catalog Vocabulary (DCAT) in JSON, with more standards and profiles of standards planned. Of these standards, the ISO 19115-series metadata is the most complete and feature-rich, and for this reason it is used by NCDC as the source for the other metadata standards. We will discuss the capabilities of metadata standards and how these standards are being implemented to document datasets. Successful implementations include developing translations and displays using XSLTs, creating links to related data and resources, documenting dataset lineage, and establishing best practices. Benefits, gaps, and challenges will be highlighted with suggestions for improved approaches to metadata storage and maintenance.
Automated Deployment of Advanced Controls and Analytics in Buildings

NASA Astrophysics Data System (ADS)

Pritoni, Marco

Buildings use 40% of primary energy in the US. Recent studies show that developing energy analytics and enhancing control strategies can significantly improve their energy performance. However, the deployment of advanced control software applications has been mostly limited to academic studies. Larger-scale implementations are prevented by the significant engineering time and customization required, due to significant differences among buildings. This study demonstrates how physics-inspired data-driven models can be used to develop portable analytics and control applications for buildings. Specifically, I demonstrate application of these models in all phases of the deployment of advanced controls and analytics in buildings: in the first phase, "Site Preparation and Interface with Legacy Systems" I used models to discover or map relationships among building components, automatically gathering metadata (information about data points) necessary to run the applications. During the second phase: "Application Deployment and Commissioning", models automatically learn system parameters, used for advanced controls and analytics. In the third phase: "Continuous Monitoring and Verification" I utilized models to automatically measure the energy performance of a building that has implemented advanced control strategies. In the conclusions, I discuss future challenges and suggest potential strategies for these innovative control systems to be widely deployed in the market. This dissertation provides useful new tools in terms of procedures, algorithms, and models to facilitate the automation of deployment of advanced controls and analytics and accelerate their wide adoption in buildings.
From the inside-out: Retrospectives on a metadata improvement process to advance the discoverability of NASÁs earth science data

NASA Astrophysics Data System (ADS)

Hernández, B. E.; Bugbee, K.; le Roux, J.; Beaty, T.; Hansen, M.; Staton, P.; Sisco, A. W.

2017-12-01

Earth observation (EO) data collected as part of NASA's Earth Observing System Data and Information System (EOSDIS) is now searchable via the Common Metadata Repository (CMR). The Analysis and Review of CMR (ARC) Team at Marshall Space Flight Center has been tasked with reviewing all NASA metadata records in the CMR ( 7,000 records). Each collection level record and constituent granule level metadata are reviewed for both completeness as well as compliance with the CMR's set of metadata standards, as specified in the Unified Metadata Model (UMM). NASA's Distributed Active Archive Centers (DAACs) have been harmonizing priority metadata records within the context of the inter-agency federal Big Earth Data Initiative (BEDI), which seeks to improve the discoverability, accessibility, and usability of EO data. Thus, the first phase of this project constitutes reviewing BEDI metadata records, while the second phase will constitute reviewing the remaining non-BEDI records in CMR. This presentation will discuss the ARC team's findings in terms of the overall quality of BEDI records across all DAACs as well as compliance with UMM standards. For instance, only a fifth of the collection-level metadata fields needed correction, compared to a quarter of the granule-level fields. It should be noted that the degree to which DAACs' metadata did not comply with the UMM standards may reflect multiple factors, such as recent changes in the UMM standards, and the utilization of different metadata formats (e.g. DIF 10, ECHO 10, ISO 19115-1) across the DAACs. Insights, constructive criticism, and lessons learned from this metadata review process will be contributed from both ORNL and SEDAC. Further inquiry along such lines may lead to insights which may improve the metadata curation process moving forward. In terms of the broader implications for metadata compliance with the UMM standards, this research has shown that a large proportion of the prioritized collections have already been made compliant, although the process of improving metadata quality is ongoing and iterative. Further research is also warranted into whether or not the gains in metadata quality are also driving gains in data use.
Forum Guide to Metadata: The Meaning behind Education Data. NFES 2009-805

ERIC Educational Resources Information Center

National Forum on Education Statistics, 2009

2009-01-01

The purpose of this guide is to empower people to more effectively use data as information. To accomplish this, the publication explains what metadata are; why metadata are critical to the development of sound education data systems; what components comprise a metadata system; what value metadata bring to data management and use; and how to…
Metadata Effectiveness in Internet Discovery: An Analysis of Digital Collection Metadata Elements and Internet Search Engine Keywords

ERIC Educational Resources Information Center

Yang, Le

2016-01-01

This study analyzed digital item metadata and keywords from Internet search engines to learn what metadata elements actually facilitate discovery of digital collections through Internet keyword searching and how significantly each metadata element affects the discovery of items in a digital repository. The study found that keywords from Internet…
Automation in visual inspection tasks: X-ray luggage screening supported by a system of direct, indirect or adaptable cueing with low and high system reliability.

PubMed

Chavaillaz, Alain; Schwaninger, Adrian; Michel, Stefan; Sauer, Juergen

2018-05-25

The present study evaluated three automation modes for improving performance in an X-ray luggage screening task. 140 participants were asked to detect the presence of prohibited items in X-ray images of cabin luggage. Twenty participants conducted this task without automatic support (control group), whereas the others worked with either indirect cues (system indicated the target presence without specifying its location), or direct cues (system pointed out the exact target location) or adaptable automation (participants could freely choose between no cue, direct and indirect cues). Furthermore, automatic support reliability was manipulated (low vs. high). The results showed a clear advantage for direct cues regarding detection performance and response time. No benefits were observed for adaptable automation. Finally, high automation reliability led to better performance and higher operator trust. The findings overall confirmed that automatic support systems for luggage screening should be designed such that they provide direct, highly reliable cues.
A novel framework for assessing metadata quality in epidemiological and public health research settings

PubMed Central

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly. PMID:27570670
A novel framework for assessing metadata quality in epidemiological and public health research settings.

PubMed

McMahon, Christiana; Denaxas, Spiros

2016-01-01

Metadata are critical in epidemiological and public health research. However, a lack of biomedical metadata quality frameworks and limited awareness of the implications of poor quality metadata renders data analyses problematic. In this study, we created and evaluated a novel framework to assess metadata quality of epidemiological and public health research datasets. We performed a literature review and surveyed stakeholders to enhance our understanding of biomedical metadata quality assessment. The review identified 11 studies and nine quality dimensions; none of which were specifically aimed at biomedical metadata. 96 individuals completed the survey; of those who submitted data, most only assessed metadata quality sometimes, and eight did not at all. Our framework has four sections: a) general information; b) tools and technologies; c) usability; and d) management and curation. We evaluated the framework using three test cases and sought expert feedback. The framework can assess biomedical metadata quality systematically and robustly.
Automatic and user-centric approaches to video summary evaluation

NASA Astrophysics Data System (ADS)

Taskiran, Cuneyt M.; Bentley, Frank

2007-01-01

Automatic video summarization has become an active research topic in content-based video processing. However, not much emphasis has been placed on developing rigorous summary evaluation methods and developing summarization systems based on a clear understanding of user needs, obtained through user centered design. In this paper we address these two topics and propose an automatic video summary evaluation algorithm adapted from teh text summarization domain.
Towards Data Value-Level Metadata for Clinical Studies.

PubMed

Zozus, Meredith Nahm; Bonner, Joseph

2017-01-01

While several standards for metadata describing clinical studies exist, comprehensive metadata to support traceability of data from clinical studies has not been articulated. We examine uses of metadata in clinical studies. We examine and enumerate seven sources of data value-level metadata in clinical studies inclusive of research designs across the spectrum of the National Institutes of Health definition of clinical research. The sources of metadata inform categorization in terms of metadata describing the origin of a data value, the definition of a data value, and operations to which the data value was subjected. The latter is further categorized into information about changes to a data value, movement of a data value, retrieval of a data value, and data quality checks, constraints or assessments to which the data value was subjected. The implications of tracking and managing data value-level metadata are explored.
Adaptive Self Tuning

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peterson, Matthew; Draelos, Timothy; Knox, Hunter

2017-05-02

The AST software includes numeric methods to 1) adjust STA/LTA signal detector trigger level (TL) values and 2) filter detections for a network of sensors. AST adapts TL values to the current state of the environment by leveraging cooperation within a neighborhood of sensors. The key metric that guides the dynamic tuning is consistency of each sensor with its nearest neighbors: TL values are automatically adjusted on a per station basis to be more or less sensitive to produce consistent agreement of detections in its neighborhood. The AST algorithm adapts in near real-time to changing conditions in an attempt tomore » automatically self-tune a signal detector to identify (detect) only signals from events of interest.« less
Managing Complex Change in Clinical Study Metadata

PubMed Central

Brandt, Cynthia A.; Gadagkar, Rohit; Rodriguez, Cesar; Nadkarni, Prakash M.

2004-01-01

In highly functional metadata-driven software, the interrelationships within the metadata become complex, and maintenance becomes challenging. We describe an approach to metadata management that uses a knowledge-base subschema to store centralized information about metadata dependencies and use cases involving specific types of metadata modification. Our system borrows ideas from production-rule systems in that some of this information is a high-level specification that is interpreted and executed dynamically by a middleware engine. Our approach is implemented in TrialDB, a generic clinical study data management system. We review approaches that have been used for metadata management in other contexts and describe the features, capabilities, and limitations of our system. PMID:15187070
Distributed Information System for Dynamic Ocean Data in Indonesia

NASA Astrophysics Data System (ADS)

Romero, Laia; Sala, Joan; Polo, Isabel; Cases, Oscar; López, Alejandro; Jolibois, Tony; Carbou, Jérome

2014-05-01

Information systems are widely used to enable access to scientific data by different user communities. MyOcean information system is a good example of such applications in Europe. The present work describes a specific distributed information system for Ocean Numerical Model (ONM) data in the scope of the INDESO project, a project focused on Infrastructure Development of Space Oceanography in Indonesia. INDESO, as part of the Blue Revolution policy conducted by the Indonesian government for the sustainable development of fisheries and aquaculture, presents challenging service requirements in terms of services performance, reliability, security and overall usability. Following state-of-the-art technologies on scientific data networks, this robust information system provides a high level of interoperability of services to discover, view and access INDESO dynamic ONM scientific data. The entire system is automatically updated four times a day, including dataset metadata, taking into account every new file available in the data repositories. The INDESO system architecture has been designed in great part around the extension and integration of open-source flexible and mature technologies. It involves three separate modules: web portal, dissemination gateway, and user administration. Supporting different gridded and non-gridded data, the INDESO information system features search-based data discovery, data access by temporal and spatial subset extraction, direct download and ftp, and multiple-layer visualization of datasets. A complex authorization system has been designed and applied throughout all components, in order to enable services authorization at dataset level, according to the different user profiles stated in the data policy. Finally, a web portal has been developed as the single entry point and standardized interface to all data services (discover, view, and access). Apache SOLR has been implemented as the search server, allowing faceted browsing among ocean data products and the connection to an external catalogue of metadata records. ncWMS and Godiva2 have been the basis of the viewing server and client technologies developed, MOTU has been used for data subsetting and intelligent management of data queues, and has allowed the deployment of a centralised download interface applicable to all ONM products. Unidata's Thredds server has been employed to provide file metadata and remote access to ONM data. CAS has been used as the single sign-on protocol for all data services. The user management application developed has been based on GOSA2. Joomla and Bootstrap have been the technologies used for the web portal, compatible with mobile phone and tablet devices. The INDESO information system comes up as an information system that is scalable, extremely easy to use, operate and maintain. This will facilitate the extensive use of ocean numerical model data by the scientific community in Indonesia. Constituted mostly of open-source solutions, the system is able to meet strict operational requirements, and carry out complex functions. It is feasible to adapt this architecture to different static and dynamic oceanographic data sources and large data volumes, in an accessible, fast, and comprehensive manner.
A Digital Broadcast Item (DBI) enabling metadata repository for digital, interactive television (digiTV) feedback channel networks

NASA Astrophysics Data System (ADS)

Lugmayr, Artur R.; Mailaparampil, Anurag; Tico, Florina; Kalli, Seppo; Creutzburg, Reiner

2003-01-01

Digital television (digiTV) is an additional multimedia environment, where metadata is one key element for the description of arbitrary content. This implies adequate structures for content description, which is provided by XML metadata schemes (e.g. MPEG-7, MPEG-21). Content and metadata management is the task of a multimedia repository, from which digiTV clients - equipped with an Internet connection - can access rich additional multimedia types over an "All-HTTP" protocol layer. Within this research work, we focus on conceptual design issues of a metadata repository for the storage of metadata, accessible from the feedback channel of a local set-top box. Our concept describes the whole heterogeneous life-cycle chain of XML metadata from the service provider to the digiTV equipment, device independent representation of content, accessing and querying the metadata repository, management of metadata related to digiTV, and interconnection of basic system components (http front-end, relational database system, and servlet container). We present our conceptual test configuration of a metadata repository that is aimed at a real-world deployment, done within the scope of the future interaction (fiTV) project at the Digital Media Institute (DMI) Tampere (www.futureinteraction.tv).
Metazen – metadata capture for metagenomes

PubMed Central

2014-01-01

Background As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusions Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility. PMID:25780508
Metazen - metadata capture for metagenomes.

PubMed

Bischof, Jared; Harrison, Travis; Paczian, Tobias; Glass, Elizabeth; Wilke, Andreas; Meyer, Folker

2014-01-01

As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

Improving Access to NASA Earth Science Data through Collaborative Metadata Curation

NASA Astrophysics Data System (ADS)

Sisco, A. W.; Bugbee, K.; Shum, D.; Baynes, K.; Dixon, V.; Ramachandran, R.

2017-12-01

The NASA-developed Common Metadata Repository (CMR) is a high-performance metadata system that currently catalogs over 375 million Earth science metadata records. It serves as the authoritative metadata management system of NASA's Earth Observing System Data and Information System (EOSDIS), enabling NASA Earth science data to be discovered and accessed by a worldwide user community. The size of the EOSDIS data archive is steadily increasing, and the ability to manage and query this archive depends on the input of high quality metadata to the CMR. Metadata that does not provide adequate descriptive information diminishes the CMR's ability to effectively find and serve data to users. To address this issue, an innovative and collaborative review process is underway to systematically improve the completeness, consistency, and accuracy of metadata for approximately 7,000 data sets archived by NASA's twelve EOSDIS data centers, or Distributed Active Archive Centers (DAACs). The process involves automated and manual metadata assessment of both collection and granule records by a team of Earth science data specialists at NASA Marshall Space Flight Center. The team communicates results to DAAC personnel, who then make revisions and reingest improved metadata into the CMR. Implementation of this process relies on a network of interdisciplinary collaborators leveraging a variety of communication platforms and long-range planning strategies. Curating metadata at this scale and resolving metadata issues through community consensus improves the CMR's ability to serve current and future users and also introduces best practices for stewarding the next generation of Earth Observing System data. This presentation will detail the metadata curation process, its outcomes thus far, and also share the status of ongoing curation activities.
CMR Metadata Curation

NASA Technical Reports Server (NTRS)

Shum, Dana; Bugbee, Kaylin

2017-01-01

This talk explains the ongoing metadata curation activities in the Common Metadata Repository. It explores tools that exist today which are useful for building quality metadata and also opens up the floor for discussions on other potentially useful tools.
Creating Access Points to Instrument-Based Atmospheric Data: Perspectives from the ARM Metadata Manager

NASA Astrophysics Data System (ADS)

Troyan, D.

2016-12-01

The Atmospheric Radiation Measurement (ARM) program has been collecting data from instruments in diverse climate regions for nearly twenty-five years. These data are made available to all interested parties at no cost via specially designed tools found on the ARM website (www.arm.gov). Metadata is created and applied to the various datastreams to facilitate information retrieval using the ARM website, the ARM Data Discovery Tool, and data quality reporting tools. Over the last year, the Metadata Manager - a relatively new position within the ARM program - created two documents that summarize the state of ARM metadata processes: ARM Metadata Workflow, and ARM Metadata Standards. These documents serve as guides to the creation and management of ARM metadata. With many of ARM's data functions spread around the Department of Energy national laboratory complex and with many of the original architects of the metadata structure no longer working for ARM, there is increased importance on using these documents to resolve issues from data flow bottlenecks and inaccurate metadata to improving data discovery and organizing web pages. This presentation will provide some examples from the workflow and standards documents. The examples will illustrate the complexity of the ARM metadata processes and the efficiency by which the metadata team works towards achieving the goal of providing access to data collected under the auspices of the ARM program.
Efficient processing of MPEG-21 metadata in the binary domain

NASA Astrophysics Data System (ADS)

Timmerer, Christian; Frank, Thomas; Hellwagner, Hermann; Heuer, Jörg; Hutter, Andreas

2005-10-01

XML-based metadata is widely adopted across the different communities and plenty of commercial and open source tools for processing and transforming are available on the market. However, all of these tools have one thing in common: they operate on plain text encoded metadata which may become a burden in constrained and streaming environments, i.e., when metadata needs to be processed together with multimedia content on the fly. In this paper we present an efficient approach for transforming such kind of metadata which are encoded using MPEG's Binary Format for Metadata (BiM) without additional en-/decoding overheads, i.e., within the binary domain. Therefore, we have developed an event-based push parser for BiM encoded metadata which transforms the metadata by a limited set of processing instructions - based on traditional XML transformation techniques - operating on bit patterns instead of cost-intensive string comparisons.
A model for enhancing Internet medical document retrieval with "medical core metadata".

PubMed

Malet, G; Munoz, F; Appleyard, R; Hersh, W

1999-01-01

Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and MEDLINE-type content descriptions. The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines.
Developing Cyberinfrastructure Tools and Services for Metadata Quality Evaluation

NASA Astrophysics Data System (ADS)

Mecum, B.; Gordon, S.; Habermann, T.; Jones, M. B.; Leinfelder, B.; Powers, L. A.; Slaughter, P.

2016-12-01

Metadata and data quality are at the core of reusable and reproducible science. While great progress has been made over the years, much of the metadata collected only addresses data discovery, covering concepts such as titles and keywords. Improving metadata beyond the discoverability plateau means documenting detailed concepts within the data such as sampling protocols, instrumentation used, and variables measured. Given that metadata commonly do not describe their data at this level, how might we improve the state of things? Giving scientists and data managers easy to use tools to evaluate metadata quality that utilize community-driven recommendations is the key to producing high-quality metadata. To achieve this goal, we created a set of cyberinfrastructure tools and services that integrate with existing metadata and data curation workflows which can be used to improve metadata and data quality across the sciences. These tools work across metadata dialects (e.g., ISO19115, FGDC, EML, etc.) and can be used to assess aspects of quality beyond what is internal to the metadata such as the congruence between the metadata and the data it describes. The system makes use of a user-friendly mechanism for expressing a suite of checks as code in popular data science programming languages such as Python and R. This reduces the burden on scientists and data managers to learn yet another language. We demonstrated these services and tools in three ways. First, we evaluated a large corpus of datasets in the DataONE federation of data repositories against a metadata recommendation modeled after existing recommendations such as the LTER best practices and the Attribute Convention for Dataset Discovery (ACDD). Second, we showed how this service can be used to display metadata and data quality information to data producers during the data submission and metadata creation process, and to data consumers through data catalog search and access tools. Third, we showed how the centrally deployed DataONE quality service can achieve major efficiency gains by allowing member repositories to customize and use recommendations that fit their specific needs without having to create de novo infrastructure at their site.
The New Online Metadata Editor for Generating Structured Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Shrestha, B.; Palanisamy, G.; Hook, L.; Killeffer, T.; Boden, T.; Cook, R. B.; Zolly, L.; Hutchison, V.; Frame, M. T.; Cialella, A. T.; Lazer, K.

2014-12-01

Nobody is better suited to "describe" data than the scientist who created it. This "description" about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, and locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [1] [2]. Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. How is OME helping Big Data Centers like ORNL DAAC? The ORNL DAAC is one of NASA's Earth Observing System Data and Information System (EOSDIS) data centers managed by the ESDIS Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability.References:[1] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [2] Wilson, Bruce E., et al. "Mercury Toolset for Spatiotemporal Metadata." NASA Technical Reports Server (NTRS) (2010).
Automatic Adaptation to Fast Input Changes in a Time-Invariant Neural Circuit

PubMed Central

Bharioke, Arjun; Chklovskii, Dmitri B.

2015-01-01

Neurons must faithfully encode signals that can vary over many orders of magnitude despite having only limited dynamic ranges. For a correlated signal, this dynamic range constraint can be relieved by subtracting away components of the signal that can be predicted from the past, a strategy known as predictive coding, that relies on learning the input statistics. However, the statistics of input natural signals can also vary over very short time scales e.g., following saccades across a visual scene. To maintain a reduced transmission cost to signals with rapidly varying statistics, neuronal circuits implementing predictive coding must also rapidly adapt their properties. Experimentally, in different sensory modalities, sensory neurons have shown such adaptations within 100 ms of an input change. Here, we show first that linear neurons connected in a feedback inhibitory circuit can implement predictive coding. We then show that adding a rectification nonlinearity to such a feedback inhibitory circuit allows it to automatically adapt and approximate the performance of an optimal linear predictive coding network, over a wide range of inputs, while keeping its underlying temporal and synaptic properties unchanged. We demonstrate that the resulting changes to the linearized temporal filters of this nonlinear network match the fast adaptations observed experimentally in different sensory modalities, in different vertebrate species. Therefore, the nonlinear feedback inhibitory network can provide automatic adaptation to fast varying signals, maintaining the dynamic range necessary for accurate neuronal transmission of natural inputs. PMID:26247884
Integrating historical clinical and financial data for pharmacological research.

PubMed

Deshmukh, Vikrant G; Sower, N Brett; Hunter, Cheri Y; Mitchell, Joyce A

2011-11-18

Retrospective research requires longitudinal data, and repositories derived from electronic health records (EHR) can be sources of such data. With Health Information Technology for Economic and Clinical Health (HITECH) Act meaningful use provisions, many institutions are expected to adopt EHRs, but may be left with large amounts of financial and historical clinical data, which can differ significantly from data obtained from newer systems, due to lack or inconsistent use of controlled medical terminologies (CMT) in older systems. We examined different approaches for semantic enrichment of financial data with CMT, and integration of clinical data from disparate historical and current sources for research. Snapshots of financial data from 1999, 2004 and 2009 were mapped automatically to the current inpatient pharmacy catalog, and enriched with RxNorm. Administrative metadata from financial and dispensing systems, RxNorm and two commercial pharmacy vocabularies were used to integrate data from current and historical inpatient pharmacy modules, and the outpatient EHR. Data integration approaches were compared using percentages of automated matches, and effects on cohort size of a retrospective study. During 1999-2009, 71.52%-90.08% of items in use from the financial catalog were enriched using RxNorm; 64.95%-70.37% of items in use from the historical inpatient system were integrated using RxNorm, 85.96%-91.67% using a commercial vocabulary, 87.19%-94.23% using financial metadata, and 77.20%-94.68% using dispensing metadata. During 1999-2009, 48.01%-30.72% of items in use from the outpatient catalog were integrated using RxNorm, and 79.27%-48.60% using a commercial vocabulary. In a cohort of 16304 inpatients obtained from clinical systems, 4172 (25.58%) were found exclusively through integration of historical clinical data, while 15978 (98%) could be identified using semantically enriched financial data. Data integration using metadata from financial/dispensing systems and pharmacy vocabularies were comparable. Given the current state of EHR adoption, semantic enrichment of financial data and integration of historical clinical data would allow the repurposing of these data for research. With the push for HITECH meaningful use, institutions that are transitioning to newer EHRs will be able to use their older financial and clinical data for research using these methods.
Integrating historical clinical and financial data for pharmacological research

PubMed Central

2011-01-01

Background Retrospective research requires longitudinal data, and repositories derived from electronic health records (EHR) can be sources of such data. With Health Information Technology for Economic and Clinical Health (HITECH) Act meaningful use provisions, many institutions are expected to adopt EHRs, but may be left with large amounts of financial and historical clinical data, which can differ significantly from data obtained from newer systems, due to lack or inconsistent use of controlled medical terminologies (CMT) in older systems. We examined different approaches for semantic enrichment of financial data with CMT, and integration of clinical data from disparate historical and current sources for research. Methods Snapshots of financial data from 1999, 2004 and 2009 were mapped automatically to the current inpatient pharmacy catalog, and enriched with RxNorm. Administrative metadata from financial and dispensing systems, RxNorm and two commercial pharmacy vocabularies were used to integrate data from current and historical inpatient pharmacy modules, and the outpatient EHR. Data integration approaches were compared using percentages of automated matches, and effects on cohort size of a retrospective study. Results During 1999-2009, 71.52%-90.08% of items in use from the financial catalog were enriched using RxNorm; 64.95%-70.37% of items in use from the historical inpatient system were integrated using RxNorm, 85.96%-91.67% using a commercial vocabulary, 87.19%-94.23% using financial metadata, and 77.20%-94.68% using dispensing metadata. During 1999-2009, 48.01%-30.72% of items in use from the outpatient catalog were integrated using RxNorm, and 79.27%-48.60% using a commercial vocabulary. In a cohort of 16304 inpatients obtained from clinical systems, 4172 (25.58%) were found exclusively through integration of historical clinical data, while 15978 (98%) could be identified using semantically enriched financial data. Conclusions Data integration using metadata from financial/dispensing systems and pharmacy vocabularies were comparable. Given the current state of EHR adoption, semantic enrichment of financial data and integration of historical clinical data would allow the repurposing of these data for research. With the push for HITECH meaningful use, institutions that are transitioning to newer EHRs will be able to use their older financial and clinical data for research using these methods. PMID:22099213
Enhancing the Value of Sensor-based Observations by Capturing the Knowledge of How An Observation Came to Be

NASA Astrophysics Data System (ADS)

Fredericks, J.; Rueda-Velasquez, C. A.

2016-12-01

As we move from keeping data on our disks to sharing it with the world, often in real-time, we are obligated to also tell an unknown user about how our observations were made. Data that are shared must not only have ownership metadata, unit descriptions and content formatting information. The provider must also share information that is needed to assess the data as it relates to potential re-use. A user must be able to assess the limitations and capabilities of the sensor, as it is configured, to understand its value. For example, when an instrument is configured, it typically affects the data accuracy and operational limits of the sensor. An operator may sacrifice data accuracy to achieve a broader operational range and visa versa. If you are looking at newly discovered data, it is important to be able to find all of the information that relates to assessing the data quality for your particular application. Traditionally, metadata are captured by data managers who usually do not know how the data are collected. By the time data are distributed, this knowledge is often gone, buried within notebooks or hidden in documents that are not machine-harvestable and often not human-readable. In a recently funded NSF EarthCube Integrative Activity called X-DOMES (Cross-Domain Observational Metadata in EnviroSensing), mechanisms are underway to enable the capture of sensor and deployment metadata by sensor manufacturers and field operators. The support has enabled the development of a community ontology repository (COR) within the Earth Science Information Partnership (ESIP) community, fostering easy creation of resolvable terms for the broader community. This tool enables non-experts to easily develop W3C standards-based content, promoting the implementation of Semantic Web technologies for enhanced discovery of content and interoperability in workflows. The X-DOMES project is also developing a SensorML Viewer/Editor to provide an easy interface for sensor manufacturers and field operators to fully-describe sensor capabilities and configuration/deployment content - automatically generating it in machine-harvestable encodings that can be referenced by data managers and/or associated with the data through web-services, such as the OGC SWE Sensor Observation Service.
USGIN ISO metadata profile

NASA Astrophysics Data System (ADS)

Richard, S. M.

2011-12-01

The USGIN project has drafted and is using a specification for use of ISO 19115/19/39 metadata, recommendations for simple metadata content, and a proposal for a URI scheme to identify resources using resolvable http URI's(see http://lab.usgin.org/usgin-profiles). The principal target use case is a catalog in which resources can be registered and described by data providers for discovery by users. We are currently using the ESRI Geoportal (Open Source), with configuration files for the USGIN profile. The metadata offered by the catalog must provide sufficient content to guide search engines to locate requested resources, to describe the resource content, provenance, and quality so users can determine if the resource will serve for intended usage, and finally to enable human users and sofware clients to obtain or access the resource. In order to achieve an operational federated catalog system, provisions in the ISO specification must be restricted and usage clarified to reduce the heterogeneity of 'standard' metadata and service implementations such that a single client can search against different catalogs, and the metadata returned by catalogs can be parsed reliably to locate required information. Usage of the complex ISO 19139 XML schema allows for a great deal of structured metadata content, but the heterogenity in approaches to content encoding has hampered development of sophisticated client software that can take advantage of the rich metadata; the lack of such clients in turn reduces motivation for metadata producers to produce content-rich metadata. If the only significant use of the detailed, structured metadata is to format into text for people to read, then the detailed information could be put in free text elements and be just as useful. In order for complex metadata encoding and content to be useful, there must be clear and unambiguous conventions on the encoding that are utilized by the community that wishes to take advantage of advanced metadata content. The use cases for the detailed content must be well understood, and the degree of metadata complexity should be determined by requirements for those use cases. The ISO standard provides sufficient flexibility that relatively simple metadata records can be created that will serve for text-indexed search/discovery, resource evaluation by a user reading text content from the metadata, and access to the resource via http, ftp, or well-known service protocols (e.g. Thredds; OGC WMS, WFS, WCS).
PH5 for integrating and archiving different data types

NASA Astrophysics Data System (ADS)

Azevedo, Steve; Hess, Derick; Beaudoin, Bruce

2016-04-01

PH5 is IRIS PASSCAL's file organization of HDF5 used for seismic data. The extensibility and portability of HDF5 allows the PH5 format to evolve and operate on a variety of platforms and interfaces. To make PH5 even more flexible, the seismic metadata is separated from the time series data in order to achieve gains in performance as well as ease of use and to simplify user interaction. This separation affords easy updates to metadata after the data are archived without having to access waveform data. To date, PH5 is currently used for integrating and archiving active source, passive source, and onshore-offshore seismic data sets with the IRIS Data Management Center (DMC). Active development to make PH5 fully compatible with FDSN web services and deliver StationXML is near completion. We are also exploring the feasibility of utilizing QuakeML for active seismic source representation. The PH5 software suite, PIC KITCHEN, comprises in-field tools that include data ingestion (e.g. RefTek format, SEG-Y, and SEG-D), meta-data management tools including QC, and a waveform review tool. These tools enable building archive ready data in-field during active source experiments greatly decreasing the time to produce research ready data sets. Once archived, our online request page generates a unique web form and pre-populates much of it based on the metadata provided to it from the PH5 file. The data requester then can intuitively select the extraction parameters as well as data subsets they wish to receive (current output formats include SEG-Y, SAC, mseed). The web interface then passes this on to the PH5 processing tools to generate the requested seismic data, and e-mail the requester a link to the data set automatically as soon as the data are ready. PH5 file organization was originally designed to hold seismic time series data and meta-data from controlled source experiments using RefTek data loggers. The flexibility of HDF5 has enabled us to extend the use of PH5 in several areas one of which is using PH5 to handle very large data sets. PH5 is also good at integrating data from various types of seismic experiments such as OBS, onshore-offshore, controlled source, and passive recording. HDF5 is capable of holding practically any type of digital data so integrating GPS data with seismic data is possible. Since PH5 is a common format and data contained in HDF5 is accessible randomly it has been easy to extend to include new input and output data formats as community needs arise.
ExcelAutomat: a tool for systematic processing of files as applied to quantum chemical calculations

NASA Astrophysics Data System (ADS)

Laloo, Jalal Z. A.; Laloo, Nassirah; Rhyman, Lydia; Ramasami, Ponnadurai

2017-07-01

The processing of the input and output files of quantum chemical calculations often necessitates a spreadsheet as a key component of the workflow. Spreadsheet packages with a built-in programming language editor can automate the steps involved and thus provide a direct link between processing files and the spreadsheet. This helps to reduce user-interventions as well as the need to switch between different programs to carry out each step. The ExcelAutomat tool is the implementation of this method in Microsoft Excel (MS Excel) using the default Visual Basic for Application (VBA) programming language. The code in ExcelAutomat was adapted to work with the platform-independent open-source LibreOffice Calc, which also supports VBA. ExcelAutomat provides an interface through the spreadsheet to automate repetitive tasks such as merging input files, splitting, parsing and compiling data from output files, and generation of unique filenames. Selected extracted parameters can be retrieved as variables which can be included in custom codes for a tailored approach. ExcelAutomat works with Gaussian files and is adapted for use with other computational packages including the non-commercial GAMESS. ExcelAutomat is available as a downloadable MS Excel workbook or as a LibreOffice workbook.
ExcelAutomat: a tool for systematic processing of files as applied to quantum chemical calculations.

PubMed

Laloo, Jalal Z A; Laloo, Nassirah; Rhyman, Lydia; Ramasami, Ponnadurai

2017-07-01

The processing of the input and output files of quantum chemical calculations often necessitates a spreadsheet as a key component of the workflow. Spreadsheet packages with a built-in programming language editor can automate the steps involved and thus provide a direct link between processing files and the spreadsheet. This helps to reduce user-interventions as well as the need to switch between different programs to carry out each step. The ExcelAutomat tool is the implementation of this method in Microsoft Excel (MS Excel) using the default Visual Basic for Application (VBA) programming language. The code in ExcelAutomat was adapted to work with the platform-independent open-source LibreOffice Calc, which also supports VBA. ExcelAutomat provides an interface through the spreadsheet to automate repetitive tasks such as merging input files, splitting, parsing and compiling data from output files, and generation of unique filenames. Selected extracted parameters can be retrieved as variables which can be included in custom codes for a tailored approach. ExcelAutomat works with Gaussian files and is adapted for use with other computational packages including the non-commercial GAMESS. ExcelAutomat is available as a downloadable MS Excel workbook or as a LibreOffice workbook.
Improving Scientific Metadata Interoperability And Data Discoverability using OAI-PMH

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James M.; Wilson, Bruce E.

2010-12-01

While general-purpose search engines (such as Google or Bing) are useful for finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. However, there are a number of different protocols for harvesting metadata, with some challenges for ensuring that updates are propagated and for collaborations with repositories using differing metadata standards. The Open Archive Initiative Protocol for Metadata Handling (OAI-PMH) is a standard that is seeing increased use as a means for exchanging structured metadata. OAI-PMH implementations must support Dublin Core as a metadata standard, with other metadata formats as optional. We have developed tools which enable our structured search tool (Mercury; http://mercury.ornl.gov) to consume metadata from OAI-PMH services in any of the metadata formats we support (Dublin Core, Darwin Core, FCDC CSDGM, GCMD DIF, EML, and ISO 19115/19137). We are also making ORNL DAAC metadata available through OAI-PMH for other metadata tools to utilize, such as the NASA Global Change Master Directory, GCMD). This paper describes Mercury capabilities with multiple metadata formats, in general, and, more specifically, the results of our OAI-PMH implementations and the lessons learned. References: [1] R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. [2] R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010). [3] Devarakonda, R.; Palanisamy, G.; Green, J.; Wilson, B. E. "Mercury: An Example of Effective Software Reuse for Metadata Management Data Discovery and Access", Eos Trans. AGU, 89(53), Fall Meet. Suppl., IN11A-1019 (2008).
FIREFLY LUCIFERASE ATP ASSAY DEVELOPMENT FOR MONITORING BACTERIAL CONCENTRATIONS IN WATER SUPPLIES

EPA Science Inventory

This research program was initiated to develop a rapid, automatable system for measuring total viable microorganisms in potable drinking water supplies using the firefly luciferase ATP assay. The assay was adapted to an automatable flow system that provided comparable sensitivity...
An Approach to Information Management for AIR7000 with Metadata and Ontologies

DTIC Science & Technology

2009-10-01

metadata. We then propose an approach based on Semantic Technologies including the Resource Description Framework (RDF) and Upper Ontologies, for the...mandating specific metadata schemas can result in interoperability problems. For example, many standards within the ADO mandate the use of XML for metadata...such problems, we propose an archi- tecture in which different metadata schemes can inter operate. By using RDF (Resource Description Framework ) as a
Making Interoperability Easier with NASA's Metadata Management Tool (MMT)

NASA Technical Reports Server (NTRS)

Shum, Dana; Reese, Mark; Pilone, Dan; Baynes, Katie

2016-01-01

While the ISO-19115 collection level metadata format meets many users' needs for interoperable metadata, it can be cumbersome to create it correctly. Through the MMT's simple UI experience, metadata curators can create and edit collections which are compliant with ISO-19115 without full knowledge of the NASA Best Practices implementation of ISO-19115 format. Users are guided through the metadata creation process through a forms-based editor, complete with field information, validation hints and picklists. Once a record is completed, users can download the metadata in any of the supported formats with just 2 clicks.
Predicting structured metadata from unstructured metadata.

PubMed

Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2016-01-01

Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data-defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. © The Author(s) 2016. Published by Oxford University Press.

Metazen – metadata capture for metagenomes

DOE PAGES

Bischof, Jared; Harrison, Travis; Paczian, Tobias; ...

2014-12-08

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Metazen – metadata capture for metagenomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, Jared; Harrison, Travis; Paczian, Tobias

Background: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. These tools are not specifically designed for metagenomic surveys; in particular, they lack themore » appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. Results: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. Conclusion: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.« less
Predicting structured metadata from unstructured metadata

PubMed Central

Posch, Lisa; Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2016-01-01

Enormous amounts of biomedical data have been and are being produced by investigators all over the world. However, one crucial and limiting factor in data reuse is accurate, structured and complete description of the data or data about the data—defined as metadata. We propose a framework to predict structured metadata terms from unstructured metadata for improving quality and quantity of metadata, using the Gene Expression Omnibus (GEO) microarray database. Our framework consists of classifiers trained using term frequency-inverse document frequency (TF-IDF) features and a second approach based on topics modeled using a Latent Dirichlet Allocation model (LDA) to reduce the dimensionality of the unstructured data. Our results on the GEO database show that structured metadata terms can be the most accurately predicted using the TF-IDF approach followed by LDA both outperforming the majority vote baseline. While some accuracy is lost by the dimensionality reduction of LDA, the difference is small for elements with few possible values, and there is a large improvement over the majority classifier baseline. Overall this is a promising approach for metadata prediction that is likely to be applicable to other datasets and has implications for researchers interested in biomedical metadata curation and metadata prediction. Database URL: http://www.yeastgenome.org/ PMID:28637268
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2005-01-01

This paper considers the deceptively simple question: Why can't digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
Why can't I manage my digital images like MP3s? The evolution and intent of multimedia metadata

NASA Astrophysics Data System (ADS)

Goodrum, Abby; Howison, James

2004-12-01

This paper considers the deceptively simple question: Why can"t digital images be managed in the simple and effective manner in which digital music files are managed? We make the case that the answer is different treatments of metadata in different domains with different goals. A central difference between the two formats stems from the fact that digital music metadata lookup services are collaborative and automate the movement from a digital file to the appropriate metadata, while image metadata services do not. To understand why this difference exists we examine the divergent evolution of metadata standards for digital music and digital images and observed that the processes differ in interesting ways according to their intent. Specifically music metadata was developed primarily for personal file management and community resource sharing, while the focus of image metadata has largely been on information retrieval. We argue that lessons from MP3 metadata can assist individuals facing their growing personal image management challenges. Our focus therefore is not on metadata for cultural heritage institutions or the publishing industry, it is limited to the personal libraries growing on our hard-drives. This bottom-up approach to file management combined with p2p distribution radically altered the music landscape. Might such an approach have a similar impact on image publishing? This paper outlines plans for improving the personal management of digital images-doing image metadata and file management the MP3 way-and considers the likelihood of success.
The Role of Metadata Standards in EOSDIS Search and Retrieval Applications

NASA Technical Reports Server (NTRS)

Pfister, Robin

1999-01-01

Metadata standards play a critical role in data search and retrieval systems. Metadata tie software to data so the data can be processed, stored, searched, retrieved and distributed. Without metadata these actions are not possible. The process of populating metadata to describe science data is an important service to the end user community so that a user who is unfamiliar with the data, can easily find and learn about a particular dataset before an order decision is made. Once a good set of standards are in place, the accuracy with which data search can be performed depends on the degree to which metadata standards are adhered during product definition. NASA's Earth Observing System Data and Information System (EOSDIS) provides examples of how metadata standards are used in data search and retrieval.
openPDS: protecting the privacy of metadata through SafeAnswers.

PubMed

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research.
openPDS: Protecting the Privacy of Metadata through SafeAnswers

PubMed Central

de Montjoye, Yves-Alexandre; Shmueli, Erez; Wang, Samuel S.; Pentland, Alex Sandy

2014-01-01

The rise of smartphones and web services made possible the large-scale collection of personal metadata. Information about individuals' location, phone call logs, or web-searches, is collected and used intensively by organizations and big data researchers. Metadata has however yet to realize its full potential. Privacy and legal concerns, as well as the lack of technical solutions for personal metadata management is preventing metadata from being shared and reconciled under the control of the individual. This lack of access and control is furthermore fueling growing concerns, as it prevents individuals from understanding and managing the risks associated with the collection and use of their data. Our contribution is two-fold: (1) we describe openPDS, a personal metadata management framework that allows individuals to collect, store, and give fine-grained access to their metadata to third parties. It has been implemented in two field studies; (2) we introduce and analyze SafeAnswers, a new and practical way of protecting the privacy of metadata at an individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. The dimensionality of the data shared with the services is reduced from high-dimensional metadata to low-dimensional answers that are less likely to be re-identifiable and to contain sensitive information. These answers can then be directly shared individually or in aggregate. openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata, thereby supporting the creation of smart data-driven services and data science research. PMID:25007320
A Research Agenda and Vision for Data Science

NASA Astrophysics Data System (ADS)

Mattmann, C. A.

2014-12-01

Big Data has emerged as a first-class citizen in the research community spanning disciplines in the domain sciences - Astronomy is pushing velocity with new ground-based instruments such as the Square Kilometre Array (SKA) and its unprecedented data rates (700 TB/sec!); Earth-science is pushing the boundaries of volume with increasing experiments in the international Intergovernmental Panel on Climate Change (IPCC) and climate modeling and remote sensing communities increasing the size of the total archives into the Exabytes scale; airborne missions from NASA such as the JPL Airborne Snow Observatory (ASO) is increasing both its velocity and decreasing the overall turnaround time required to receive products and to make them available to water managers and decision makers. Proteomics and the computational biology community are sequencing genomes and providing near real time answers to clinicians, researchers, and ultimately to patients, helping to process and understand and create diagnoses. Data complexity is on the rise, and the norm is no longer 100s of metadata attributes, but thousands to hundreds of thousands, including complex interrelationships between data and metadata and knowledge. I published a vision for data science in Nature 2013 that encapsulates four thrust areas and foci that I believe the computer science, Big Data, and data science communities need to attack over the next decade to make fundamental progress in the data volume, velocity and complexity challenges arising from the domain sciences such as those described above. These areas include: (1) rapid and unobtrusive algorithm integration; (2) intelligent and automatic data movement; (3) automated and rapid extraction text, metadata and language from heterogeneous file formats; and (4) participation and people power via open source communities. In this talk I will revisit these four areas and describe current progress; future work and challenges ahead as we move forward in this exciting age of Data Science.
Optimal and adaptive methods of processing hydroacoustic signals (review)

NASA Astrophysics Data System (ADS)

Malyshkin, G. S.; Sidel'nikov, G. B.

2014-09-01

Different methods of optimal and adaptive processing of hydroacoustic signals for multipath propagation and scattering are considered. Advantages and drawbacks of the classical adaptive (Capon, MUSIC, and Johnson) algorithms and "fast" projection algorithms are analyzed for the case of multipath propagation and scattering of strong signals. The classical optimal approaches to detecting multipath signals are presented. A mechanism of controlled normalization of strong signals is proposed to automatically detect weak signals. The results of simulating the operation of different detection algorithms for a linear equidistant array under multipath propagation and scattering are presented. An automatic detector is analyzed, which is based on classical or fast projection algorithms, which estimates the background proceeding from median filtering or the method of bilateral spatial contrast.
Progress in defining a standard for file-level metadata

NASA Technical Reports Server (NTRS)

Williams, Joel; Kobler, Ben

1996-01-01

In the following narrative, metadata required to locate a file on tape or collection of tapes will be referred to as file-level metadata. This paper discribes the rationale for and the history of the effort to define a standard for this metadata.
Achieving interoperability for metadata registries using comparative object modeling.

PubMed

Park, Yu Rang; Kim, Ju Han

2010-01-01

Achieving data interoperability between organizations relies upon agreed meaning and representation (metadata) of data. For managing and registering metadata, many organizations have built metadata registries (MDRs) in various domains based on international standard for MDR framework, ISO/IEC 11179. Following this trend, two pubic MDRs in biomedical domain have been created, United States Health Information Knowledgebase (USHIK) and cancer Data Standards Registry and Repository (caDSR), from U.S. Department of Health & Human Services and National Cancer Institute (NCI), respectively. Most MDRs are implemented with indiscriminate extending for satisfying organization-specific needs and solving semantic and structural limitation of ISO/IEC 11179. As a result it is difficult to address interoperability among multiple MDRs. In this paper, we propose an integrated metadata object model for achieving interoperability among multiple MDRs. To evaluate this model, we developed an XML Schema Definition (XSD)-based metadata exchange format. We created an XSD-based metadata exporter, supporting both the integrated metadata object model and organization-specific MDR formats.
Request queues for interactive clients in a shared file system of a parallel computing system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bent, John M.; Faibish, Sorin

Interactive requests are processed from users of log-in nodes. A metadata server node is provided for use in a file system shared by one or more interactive nodes and one or more batch nodes. The interactive nodes comprise interactive clients to execute interactive tasks and the batch nodes execute batch jobs for one or more batch clients. The metadata server node comprises a virtual machine monitor; an interactive client proxy to store metadata requests from the interactive clients in an interactive client queue; a batch client proxy to store metadata requests from the batch clients in a batch client queue;more » and a metadata server to store the metadata requests from the interactive client queue and the batch client queue in a metadata queue based on an allocation of resources by the virtual machine monitor. The metadata requests can be prioritized, for example, based on one or more of a predefined policy and predefined rules.« less
Apparatus enables automatic microanalysis of body fluids

NASA Technical Reports Server (NTRS)

Soffen, G. A.; Stuart, J. L.

1966-01-01

Apparatus will automatically and quantitatively determine body fluid constituents which are amenable to analysis by fluorometry or colorimetry. The results of the tests are displayed as percentages of full scale deflection on a strip-chart recorder. The apparatus can also be adapted for microanalysis of various other fluids.
SU-C-202-03: A Tool for Automatic Calculation of Delivered Dose Variation for Off-Line Adaptive Therapy Using Cone Beam CT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, B; Lee, S; Chen, S

Purpose: Monitoring the delivered dose is an important task for the adaptive radiotherapy (ART) and for determining time to re-plan. A software tool which enables automatic delivered dose calculation using cone-beam CT (CBCT) has been developed and tested. Methods: The tool consists of four components: a CBCT Colleting Module (CCM), a Plan Registration Moduel (PRM), a Dose Calculation Module (DCM), and an Evaluation and Action Module (EAM). The CCM is triggered periodically (e.g. every 1:00 AM) to search for newly acquired CBCTs of patients of interest and then export the DICOM files of the images and related registrations defined inmore » ARIA followed by triggering the PRM. The PRM imports the DICOM images and registrations, links the CBCTs to the related treatment plan of the patient in the planning system (RayStation V4.5, RaySearch, Stockholm, Sweden). A pre-determined CT-to-density table is automatically generated for dose calculation. Current version of the DCM uses a rigid registration which regards the treatment isocenter of the CBCT to be the isocenter of the treatment plan. Then it starts the dose calculation automatically. The AEM evaluates the plan using pre-determined plan evaluation parameters: PTV dose-volume metrics and critical organ doses. The tool has been tested for 10 patients. Results: Automatic plans are generated and saved in the order of the treatment dates of the Adaptive Planning module of the RayStation planning system, without any manual intervention. Once the CTV dose deviates more than 3%, both email and page alerts are sent to the physician and the physicist of the patient so that one can look the case closely. Conclusion: The tool is capable to perform automatic dose tracking and to alert clinicians when an action is needed. It is clinically useful for off-line adaptive therapy to catch any gross error. Practical way of determining alarming level for OAR is under development.« less
Making metadata usable in a multi-national research setting.

PubMed

Ellul, Claire; Foord, Joanna; Mooney, John

2013-11-01

SECOA (Solutions for Environmental Contrasts in Coastal Areas) is a multi-national research project examining the effects of human mobility on urban settlements in fragile coastal environments. This paper describes the setting up of a SECOA metadata repository for non-specialist researchers such as environmental scientists and tourism experts. Conflicting usability requirements of two groups - metadata creators and metadata users - are identified along with associated limitations of current metadata standards. A description is given of a configurable metadata system designed to grow as the project evolves. This work is of relevance for similar projects such as INSPIRE. Copyright © 2012 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Inter-University Upper Atmosphere Global Observation Network (IUGONET) Metadata Database and Its Interoperability

NASA Astrophysics Data System (ADS)

Yatagai, A. I.; Iyemori, T.; Ritschel, B.; Koyama, Y.; Hori, T.; Abe, S.; Tanaka, Y.; Shinbori, A.; Umemura, N.; Sato, Y.; Yagi, M.; Ueno, S.; Hashiguchi, N. O.; Kaneda, N.; Belehaki, A.; Hapgood, M. A.

2013-12-01

The IUGONET is a Japanese program to build a metadata database for ground-based observations of the upper atmosphere [1]. The project began in 2009 with five Japanese institutions which archive data observed by radars, magnetometers, photometers, radio telescopes and helioscopes, and so on, at various altitudes from the Earth's surface to the Sun. Systems have been developed to allow searching of the above described metadata. We have been updating the system and adding new and updated metadata. The IUGONET development team adopted the SPASE metadata model [2] to describe the upper atmosphere data. This model is used as the common metadata format by the virtual observatories for solar-terrestrial physics. It includes metadata referring to each data file (called a 'Granule'), which enable a search for data files as well as data sets. Further details are described in [2] and [3]. Currently, three additional Japanese institutions are being incorporated in IUGONET. Furthermore, metadata of observations of the troposphere, taken at the observatories of the middle and upper atmosphere radar at Shigaraki and the Meteor radar in Indonesia, have been incorporated. These additions will contribute to efficient interdisciplinary scientific research. In the beginning of 2013, the registration of the 'Observatory' and 'Instrument' metadata was completed, which makes it easy to overview of the metadata database. The number of registered metadata as of the end of July, totalled 8.8 million, including 793 observatories and 878 instruments. It is important to promote interoperability and/or metadata exchange between the database development groups. A memorandum of agreement has been signed with the European Near-Earth Space Data Infrastructure for e-Science (ESPAS) project, which has similar objectives to IUGONET with regard to a framework for formal collaboration. Furthermore, observations by satellites and the International Space Station are being incorporated with a view for making/linking metadata databases. The development of effective data systems will contribute to the progress of scientific research on solar terrestrial physics, climate and the geophysical environment. Any kind of cooperation, metadata input and feedback, especially for linkage of the databases, is welcomed. References 1. Hayashi, H. et al., Inter-university Upper Atmosphere Global Observation Network (IUGONET), Data Sci. J., 12, WDS179-184, 2013. 2. King, T. et al., SPASE 2.0: A standard data model for space physics. Earth Sci. Inform. 3, 67-73, 2010, doi:10.1007/s12145-010-0053-4. 3. Hori, T., et al., Development of IUGONET metadata format and metadata management system. J. Space Sci. Info. Jpn., 105-111, 2012. (in Japanese)
Towards Precise Metadata-set for Discovering 3D Geospatial Models in Geo-portals

NASA Astrophysics Data System (ADS)

Zamyadi, A.; Pouliot, J.; Bédard, Y.

2013-09-01

Accessing 3D geospatial models, eventually at no cost and for unrestricted use, is certainly an important issue as they become popular among participatory communities, consultants, and officials. Various geo-portals, mainly established for 2D resources, have tried to provide access to existing 3D resources such as digital elevation model, LIDAR or classic topographic data. Describing the content of data, metadata is a key component of data discovery in geo-portals. An inventory of seven online geo-portals and commercial catalogues shows that the metadata referring to 3D information is very different from one geo-portal to another as well as for similar 3D resources in the same geo-portal. The inventory considered 971 data resources affiliated with elevation. 51% of them were from three geo-portals running at Canadian federal and municipal levels whose metadata resources did not consider 3D model by any definition. Regarding the remaining 49% which refer to 3D models, different definition of terms and metadata were found, resulting in confusion and misinterpretation. The overall assessment of these geo-portals clearly shows that the provided metadata do not integrate specific and common information about 3D geospatial models. Accordingly, the main objective of this research is to improve 3D geospatial model discovery in geo-portals by adding a specific metadata-set. Based on the knowledge and current practices on 3D modeling, and 3D data acquisition and management, a set of metadata is proposed to increase its suitability for 3D geospatial models. This metadata-set enables the definition of genuine classes, fields, and code-lists for a 3D metadata profile. The main structure of the proposal contains 21 metadata classes. These classes are classified in three packages as General and Complementary on contextual and structural information, and Availability on the transition from storage to delivery format. The proposed metadata set is compared with Canadian Geospatial Data Infrastructure (CGDI) metadata which is an implementation of North American Profile of ISO-19115. The comparison analyzes the two metadata against three simulated scenarios about discovering needed 3D geo-spatial datasets. Considering specific metadata about 3D geospatial models, the proposed metadata-set has six additional classes on geometric dimension, level of detail, geometric modeling, topology, and appearance information. In addition classes on data acquisition, preparation, and modeling, and physical availability have been specialized for 3D geospatial models.
Interactive Visualization Systems and Data Integration Methods for Supporting Discovery in Collections of Scientific Information

DTIC Science & Technology

2011-05-01

iTunes illustrate the difference between the centralized approach of digital library systems and the distributed approach of container file formats...metadata in a container file format. Apple’s iTunes uses a centralized metadata approach and allows users to maintain song metadata in a single...one iTunes library to another the metadata must be copied separately or reentered in the new library. This demonstrates the utility of storing metadata
Collaborative Metadata Curation in Support of NASA Earth Science Data Stewardship

NASA Technical Reports Server (NTRS)

Sisco, Adam W.; Bugbee, Kaylin; le Roux, Jeanne; Staton, Patrick; Freitag, Brian; Dixon, Valerie

2018-01-01

Growing collection of NASA Earth science data is archived and distributed by EOSDIS’s 12 Distributed Active Archive Centers (DAACs). Each collection and granule is described by a metadata record housed in the Common Metadata Repository (CMR). Multiple metadata standards are in use, and core elements of each are mapped to and from a common model – the Unified Metadata Model (UMM). Work done by the Analysis and Review of CMR (ARC) Team.

Mitogenome metadata: current trends and proposed standards.

PubMed

Strohm, Jeff H T; Gwiazdowski, Rodger A; Hanner, Robert

2016-09-01

Mitogenome metadata are descriptive terms about the sequence, and its specimen description that allow both to be digitally discoverable and interoperable. Here, we review a sampling of mitogenome metadata published in the journal Mitochondrial DNA between 2005 and 2014. Specifically, we have focused on a subset of metadata fields that are available for GenBank records, and specified by the Genomics Standards Consortium (GSC) and other biodiversity metadata standards; and we assessed their presence across three main categories: collection, biological and taxonomic information. To do this we reviewed 146 mitogenome manuscripts, and their associated GenBank records, and scored them for 13 metadata fields. We also explored the potential for mitogenome misidentification using their sequence diversity, and taxonomic metadata on the Barcode of Life Datasystems (BOLD). For this, we focused on all Lepidoptera and Perciformes mitogenomes included in the review, along with additional mitogenome sequence data mined from Genbank. Overall, we found that none of 146 mitogenome projects provided all the metadata we looked for; and only 17 projects provided at least one category of metadata across the three main categories. Comparisons using mtDNA sequences from BOLD, suggest that some mitogenomes may be misidentified. Lastly, we appreciate the research potential of mitogenomes announced through this journal; and we conclude with a suggestion of 13 metadata fields, available on GenBank, that if provided in a mitogenomes's GenBank record, would increase their research value.
Design and implementation of a fault-tolerant and dynamic metadata database for clinical trials

NASA Astrophysics Data System (ADS)

Lee, J.; Zhou, Z.; Talini, E.; Documet, J.; Liu, B.

2007-03-01

In recent imaging-based clinical trials, quantitative image analysis (QIA) and computer-aided diagnosis (CAD) methods are increasing in productivity due to higher resolution imaging capabilities. A radiology core doing clinical trials have been analyzing more treatment methods and there is a growing quantity of metadata that need to be stored and managed. These radiology centers are also collaborating with many off-site imaging field sites and need a way to communicate metadata between one another in a secure infrastructure. Our solution is to implement a data storage grid with a fault-tolerant and dynamic metadata database design to unify metadata from different clinical trial experiments and field sites. Although metadata from images follow the DICOM standard, clinical trials also produce metadata specific to regions-of-interest and quantitative image analysis. We have implemented a data access and integration (DAI) server layer where multiple field sites can access multiple metadata databases in the data grid through a single web-based grid service. The centralization of metadata database management simplifies the task of adding new databases into the grid and also decreases the risk of configuration errors seen in peer-to-peer grids. In this paper, we address the design and implementation of a data grid metadata storage that has fault-tolerance and dynamic integration for imaging-based clinical trials.
Metadata and Service at the GFZ ISDC Portal

NASA Astrophysics Data System (ADS)

Ritschel, B.

2008-05-01

The online service portal of the GFZ Potsdam Information System and Data Center (ISDC) is an access point for all manner of geoscientific geodata, its corresponding metadata, scientific documentation and software tools. At present almost 2000 national and international users and user groups have the opportunity to request Earth science data from a portfolio of 275 different products types and more than 20 Million single data files with an added volume of approximately 12 TByte. The majority of the data and information, the portal currently offers to the public, are global geomonitoring products such as satellite orbit and Earth gravity field data as well as geomagnetic and atmospheric data for the exploration. These products for Earths changing system are provided via state-of-the art retrieval techniques. The data product catalog system behind these techniques is based on the extensive usage of standardized metadata, which are describing the different geoscientific product types and data products in an uniform way. Where as all ISDC product types are specified by NASA's Directory Interchange Format (DIF), Version 9.0 Parent XML DIF metadata files, the individual data files are described by extended DIF metadata documents. Depending on the beginning of the scientific project, one part of data files are described by extended DIF, Version 6 metadata documents and the other part are specified by data Child XML DIF metadata documents. Both, the product type dependent parent DIF metadata documents and the data file dependent child DIF metadata documents are derived from a base-DIF.xsd xml schema file. The ISDC metadata philosophy defines a geoscientific product as a package consisting of mostly one or sometimes more than one data file plus one extended DIF metadata file. Because NASA's DIF metadata standard has been developed in order to specify a collection of data only, the extension of the DIF standard consists of new and specific attributes, which are necessary for an explicit identification of single data files and the set-up of a comprehensive Earth science data catalog. The huge ISDC data catalog is realized by product type dependent tables filled with data file related metadata, which have relations to corresponding metadata tables. The product type describing parent DIF XML metadata documents are stored and managed in ORACLE's XML storage structures. In order to improve the interoperability of the ISDC service portal, the existing proprietary catalog system will be extended by an ISO 19115 based web catalog service. In addition to this development there is ISDC related concerning semantic network of different kind of metadata resources, like different kind of standardized and not-standardized metadata documents and literature as well as Web 2.0 user generated information derived from tagging activities and social navigation data.
Predicting biomedical metadata in CEDAR: A study of Gene Expression Omnibus (GEO).

PubMed

Panahiazar, Maryam; Dumontier, Michel; Gevaert, Olivier

2017-08-01

A crucial and limiting factor in data reuse is the lack of accurate, structured, and complete descriptions of data, known as metadata. Towards improving the quantity and quality of metadata, we propose a novel metadata prediction framework to learn associations from existing metadata that can be used to predict metadata values. We evaluate our framework in the context of experimental metadata from the Gene Expression Omnibus (GEO). We applied four rule mining algorithms to the most common structured metadata elements (sample type, molecular type, platform, label type and organism) from over 1.3million GEO records. We examined the quality of well supported rules from each algorithm and visualized the dependencies among metadata elements. Finally, we evaluated the performance of the algorithms in terms of accuracy, precision, recall, and F-measure. We found that PART is the best algorithm outperforming Apriori, Predictive Apriori, and Decision Table. All algorithms perform significantly better in predicting class values than the majority vote classifier. We found that the performance of the algorithms is related to the dimensionality of the GEO elements. The average performance of all algorithm increases due of the decreasing of dimensionality of the unique values of these elements (2697 platforms, 537 organisms, 454 labels, 9 molecules, and 5 types). Our work suggests that experimental metadata such as present in GEO can be accurately predicted using rule mining algorithms. Our work has implications for both prospective and retrospective augmentation of metadata quality, which are geared towards making data easier to find and reuse. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
Mercury Toolset for Spatiotemporal Metadata

NASA Technical Reports Server (NTRS)

Wilson, Bruce E.; Palanisamy, Giri; Devarakonda, Ranjeet; Rhyne, B. Timothy; Lindsley, Chris; Green, James

2010-01-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily) harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
Mercury Toolset for Spatiotemporal Metadata

NASA Astrophysics Data System (ADS)

Devarakonda, Ranjeet; Palanisamy, Giri; Green, James; Wilson, Bruce; Rhyne, B. Timothy; Lindsley, Chris

2010-06-01

Mercury (http://mercury.ornl.gov) is a set of tools for federated harvesting, searching, and retrieving metadata, particularly spatiotemporal metadata. Version 3.0 of the Mercury toolset provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, facetted type search, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. It provides a single portal to very quickly search for data and information contained in disparate data management systems, each of which may use different metadata formats. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury periodically (typically daily)harvests metadata sources through a collection of interfaces and re-indexes these metadata to provide extremely rapid search capabilities, even over collections with tens of millions of metadata records. A number of both graphical and application interfaces have been constructed within Mercury, to enable both human users and other computer programs to perform queries. Mercury was also designed to support multiple different projects, so that the particular fields that can be queried and used with search filters are easy to configure for each different project.
SWE-based Observation Data Delivery from the Instrument to the User - Sensor Web Technology in the NeXOS Project

NASA Astrophysics Data System (ADS)

Jirka, Simon; del Rio, Joaquin; Toma, Daniel; Martinez, Enoc; Delory, Eric; Pearlman, Jay; Rieke, Matthes; Stasch, Christoph

2017-04-01

The rapidly evolving technology for building Web-based (spatial) information infrastructures and Sensor Webs, there are new opportunities to improve the process how ocean data is collected and managed. A central element in this development is the suite of Sensor Web Enablement (SWE) standards specified by the Open Geospatial Consortium (OGC). This framework of standards comprises on the one hand data models as well as formats for measurement data (ISO/OGC Observations and Measurement, O&M) and metadata describing measurement processes and sensors (OGC Sensor Model Language, SensorML). On the other hand the SWE standards comprise (Web service) interface specifications for pull-based access to observation data (OGC Sensor Observation Service, SOS) and for controlling or configuring sensors (OGC Sensor Planning Service, SPS). Also within the European INSPIRE framework the SWE standards play an important role as the SOS is the recommended download service interface for O&M-encoded observation data sets. In the context of the EU-funded Oceans of Tomorrow initiative the NeXOS (Next generation, Cost-effective, Compact, Multifunctional Web Enabled Ocean Sensor Systems Empowering Marine, Maritime and Fisheries Management) project is developing a new generation of in-situ sensors that make use of the SWE standards to facilitate the data publication process and the integration into Web based information infrastructures. This includes the development of a dedicated firmware for instruments and sensor platforms (SEISI, Smart Electronic Interface for Sensors and Instruments) maintained by the Universitat Politècnica de Catalunya (UPC). Among other features, SEISI makes use of OGC SWE standards such OGC-PUCK, to enable a plug-and-play mechanism for sensors based on SensorML encoded metadata. Thus, if a new instrument is attached to a SEISI-based platform, it automatically configures the connection to these instruments, automatically generated data files compliant with the ISO/OGC Observations and Measurements standard and initiates the data transmission into the NeXOS Sensor Web infrastructure. Besides these platform-related developments, NeXOS has realised the full path of data transmission from the sensor to the end user application. The conceptual architecture design is implemented by a series of open source SWE software packages provided by 52°North. This comprises especially different SWE server components (i.e. OGC Sensor Observation Service), tools for data visualisation (e.g. the 52°North Helgoland SOS viewer), and an editor for providing SensorML-based metadata (52°North smle). As a result, NeXOS has demonstrated how the SWE standards help to improve marine observation data collection. Within this presentation, we will present the experiences and findings of the NeXOS project and will provide recommendation for future work directions.
Metadata Realities for Cyberinfrastructure: Data Authors as Metadata Creators

ERIC Educational Resources Information Center

Mayernik, Matthew Stephen

2011-01-01

As digital data creation technologies become more prevalent, data and metadata management are necessary to make data available, usable, sharable, and storable. Researchers in many scientific settings, however, have little experience or expertise in data and metadata management. In this dissertation, I explore the everyday data and metadata…
Automatic Training of Rat Cyborgs for Navigation.

PubMed

Yu, Yipeng; Wu, Zhaohui; Xu, Kedi; Gong, Yongyue; Zheng, Nenggan; Zheng, Xiaoxiang; Pan, Gang

2016-01-01

A rat cyborg system refers to a biological rat implanted with microelectrodes in its brain, via which the outer electrical stimuli can be delivered into the brain in vivo to control its behaviors. Rat cyborgs have various applications in emergency, such as search and rescue in disasters. Prior to a rat cyborg becoming controllable, a lot of effort is required to train it to adapt to the electrical stimuli. In this paper, we build a vision-based automatic training system for rat cyborgs to replace the time-consuming manual training procedure. A hierarchical framework is proposed to facilitate the colearning between rats and machines. In the framework, the behavioral states of a rat cyborg are visually sensed by a camera, a parameterized state machine is employed to model the training action transitions triggered by rat's behavioral states, and an adaptive adjustment policy is developed to adaptively adjust the stimulation intensity. The experimental results of three rat cyborgs prove the effectiveness of our system. To the best of our knowledge, this study is the first to tackle automatic training of animal cyborgs.
Automatic Training of Rat Cyborgs for Navigation

PubMed Central

Yu, Yipeng; Wu, Zhaohui; Xu, Kedi; Gong, Yongyue; Zheng, Nenggan; Zheng, Xiaoxiang; Pan, Gang

2016-01-01

A rat cyborg system refers to a biological rat implanted with microelectrodes in its brain, via which the outer electrical stimuli can be delivered into the brain in vivo to control its behaviors. Rat cyborgs have various applications in emergency, such as search and rescue in disasters. Prior to a rat cyborg becoming controllable, a lot of effort is required to train it to adapt to the electrical stimuli. In this paper, we build a vision-based automatic training system for rat cyborgs to replace the time-consuming manual training procedure. A hierarchical framework is proposed to facilitate the colearning between rats and machines. In the framework, the behavioral states of a rat cyborg are visually sensed by a camera, a parameterized state machine is employed to model the training action transitions triggered by rat's behavioral states, and an adaptive adjustment policy is developed to adaptively adjust the stimulation intensity. The experimental results of three rat cyborgs prove the effectiveness of our system. To the best of our knowledge, this study is the first to tackle automatic training of animal cyborgs. PMID:27436999
Spectral saliency via automatic adaptive amplitude spectrum analysis

NASA Astrophysics Data System (ADS)

Wang, Xiaodong; Dai, Jialun; Zhu, Yafei; Zheng, Haiyong; Qiao, Xiaoyan

2016-03-01

Suppressing nonsalient patterns by smoothing the amplitude spectrum at an appropriate scale has been shown to effectively detect the visual saliency in the frequency domain. Different filter scales are required for different types of salient objects. We observe that the optimal scale for smoothing amplitude spectrum shares a specific relation with the size of the salient region. Based on this observation and the bottom-up saliency detection characterized by spectrum scale-space analysis for natural images, we propose to detect visual saliency, especially with salient objects of different sizes and locations via automatic adaptive amplitude spectrum analysis. We not only provide a new criterion for automatic optimal scale selection but also reserve the saliency maps corresponding to different salient objects with meaningful saliency information by adaptive weighted combination. The performance of quantitative and qualitative comparisons is evaluated by three different kinds of metrics on the four most widely used datasets and one up-to-date large-scale dataset. The experimental results validate that our method outperforms the existing state-of-the-art saliency models for predicting human eye fixations in terms of accuracy and robustness.
NetCDF4/HDF5 and Linked Data in the Real World - Enriching Geoscientific Metadata without Bloat

NASA Astrophysics Data System (ADS)

Ip, Alex; Car, Nicholas; Druken, Kelsey; Poudjom-Djomani, Yvette; Butcher, Stirling; Evans, Ben; Wyborn, Lesley

2017-04-01

NetCDF4 has become the dominant generic format for many forms of geoscientific data, leveraging (and constraining) the versatile HDF5 container format, while providing metadata conventions for interoperability. However, the encapsulation of detailed metadata within each file can lead to metadata "bloat", and difficulty in maintaining consistency where metadata is replicated to multiple locations. Complex conceptual relationships are also difficult to represent in simple key-value netCDF metadata. Linked Data provides a practical mechanism to address these issues by associating the netCDF files and their internal variables with complex metadata stored in Semantic Web vocabularies and ontologies, while complying with and complementing existing metadata conventions. One of the stated objectives of the netCDF4/HDF5 formats is that they should be self-describing: containing metadata sufficient for cataloguing and using the data. However, this objective can be regarded as only partially-met where details of conventions and definitions are maintained externally to the data files. For example, one of the most widely used netCDF community standards, the Climate and Forecasting (CF) Metadata Convention, maintains standard vocabularies for a broad range of disciplines across the geosciences, but this metadata is currently neither readily discoverable nor machine-readable. We have previously implemented useful Linked Data and netCDF tooling (ncskos) that associates netCDF files, and individual variables within those files, with concepts in vocabularies formulated using the Simple Knowledge Organization System (SKOS) ontology. NetCDF files contain Uniform Resource Identifier (URI) links to terms represented as SKOS Concepts, rather than plain-text representations of those terms, so we can use simple, standardised web queries to collect and use rich metadata for the terms from any Linked Data-presented SKOS vocabulary. Geoscience Australia (GA) manages a large volume of diverse geoscientific data, much of which is being translated from proprietary formats to netCDF at NCI Australia. This data is made available through the NCI National Environmental Research Data Interoperability Platform (NERDIP) for programmatic access and interdisciplinary analysis. The netCDF files contain both scientific data variables (e.g. gravity, magnetic or radiometric values), but also domain-specific operational values (e.g. specific instrument parameters) best described fully in formal vocabularies. Our ncskos codebase provides access to multiple stores of detailed external metadata in a standardised fashion. Geophysical datasets are generated from a "survey" event, and GA maintains corporate databases of all surveys and their associated metadata. It is impractical to replicate the full source survey metadata into each netCDF dataset so, instead, we link the netCDF files to survey metadata using public Linked Data URIs. These URIs link to Survey class objects which we model as a subclass of Activity objects as defined by the PROV Ontology, and we provide URI resolution for them via a custom Linked Data API which draws current survey metadata from GA's in-house databases. We have demonstrated that Linked Data is a practical way to associate netCDF data with detailed, external metadata. This allows us to ensure that catalogued metadata is kept consistent with metadata points-of-truth, and we can infer complex conceptual relationships not possible with netCDF key-value attributes alone.
Logs Analysis of Adapted Pedagogical Scenarios Generated by a Simulation Serious Game Architecture

ERIC Educational Resources Information Center

Callies, Sophie; Gravel, Mathieu; Beaudry, Eric; Basque, Josianne

2017-01-01

This paper presents an architecture designed for simulation serious games, which automatically generates game-based scenarios adapted to learner's learning progression. We present three central modules of the architecture: (1) the learner model, (2) the adaptation module and (3) the logs module. The learner model estimates the progression of the…
Unsupervised MDP Value Selection for Automating ITS Capabilities

ERIC Educational Resources Information Center

Stamper, John; Barnes, Tiffany

2009-01-01

We seek to simplify the creation of intelligent tutors by using student data acquired from standard computer aided instruction (CAI) in conjunction with educational data mining methods to automatically generate adaptive hints. In our previous work, we have automatically generated hints for logic tutoring by constructing a Markov Decision Process…
Distributed Multi-interface Catalogue for Geospatial Data

NASA Astrophysics Data System (ADS)

Nativi, S.; Bigagli, L.; Mazzetti, P.; Mattia, U.; Boldrini, E.

2007-12-01

Several geosciences communities (e.g. atmospheric science, oceanography, hydrology) have developed tailored data and metadata models and service protocol specifications for enabling online data discovery, inventory, evaluation, access and download. These specifications are conceived either profiling geospatial information standards or extending the well-accepted geosciences data models and protocols in order to capture more semantics. These artifacts have generated a set of related catalog -and inventory services- characterizing different communities, initiatives and projects. In fact, these geospatial data catalogs are discovery and access systems that use metadata as the target for query on geospatial information. The indexed and searchable metadata provide a disciplined vocabulary against which intelligent geospatial search can be performed within or among communities. There exists a clear need to conceive and achieve solutions to implement interoperability among geosciences communities, in the context of the more general geospatial information interoperability framework. Such solutions should provide search and access capabilities across catalogs, inventory lists and their registered resources. Thus, the development of catalog clearinghouse solutions is a near-term challenge in support of fully functional and useful infrastructures for spatial data (e.g. INSPIRE, GMES, NSDI, GEOSS). This implies the implementation of components for query distribution and virtual resource aggregation. These solutions must implement distributed discovery functionalities in an heterogeneous environment, requiring metadata profiles harmonization as well as protocol adaptation and mediation. We present a catalog clearinghouse solution for the interoperability of several well-known cataloguing systems (e.g. OGC CSW, THREDDS catalog and data services). The solution implements consistent resource discovery and evaluation over a dynamic federation of several well-known cataloguing and inventory systems. Prominent features include: 1)Support to distributed queries over a hierarchical data model, supporting incremental queries (i.e. query over collections, to be subsequently refined) and opaque/translucent chaining; 2)Support to several client protocols, through a compound front-end interface module. This allows to accommodate a (growing) number of cataloguing standards, or profiles thereof, including the OGC CSW interface, ebRIM Application Profile (for Core ISO Metadata and other data models), and the ISO Application Profile. The presented catalog clearinghouse supports both the opaque and translucent pattern for service chaining. In fact, the clearinghouse catalog may be configured either to completely hide the underlying federated services or to provide clients with services information. In both cases, the clearinghouse solution presents a higher level interface (i.e. OGC CSW) which harmonizes multiple lower level services (e.g. OGC CSW, WMS and WCS, THREDDS, etc.), and handles all control and interaction with them. In the translucent case, client has the option to directly access the lower level services (e.g. to improve performances). In the GEOSS context, the solution has been experimented both as a stand-alone user application and as a service framework. The first scenario allows a user to download a multi-platform client software and query a federation of cataloguing systems, that he can customize at will. The second scenario support server-side deployment and can be flexibly adapted to several use-cases, such as intranet proxy, catalog broker, etc.
Hybrid PolyLingual Object Model: An Efficient and Seamless Integration of Java and Native Components on the Dalvik Virtual Machine

PubMed Central

Huang, Yukun; Chen, Rong; Wei, Jingbo; Pei, Xilong; Cao, Jing; Prakash Jayaraman, Prem; Ranjan, Rajiv

2014-01-01

JNI in the Android platform is often observed with low efficiency and high coding complexity. Although many researchers have investigated the JNI mechanism, few of them solve the efficiency and the complexity problems of JNI in the Android platform simultaneously. In this paper, a hybrid polylingual object (HPO) model is proposed to allow a CAR object being accessed as a Java object and as vice in the Dalvik virtual machine. It is an acceptable substitute for JNI to reuse the CAR-compliant components in Android applications in a seamless and efficient way. The metadata injection mechanism is designed to support the automatic mapping and reflection between CAR objects and Java objects. A prototype virtual machine, called HPO-Dalvik, is implemented by extending the Dalvik virtual machine to support the HPO model. Lifespan management, garbage collection, and data type transformation of HPO objects are also handled in the HPO-Dalvik virtual machine automatically. The experimental result shows that the HPO model outweighs the standard JNI in lower overhead on native side, better executing performance with no JNI bridging code being demanded. PMID:25110745
Computerized image analysis for quantitative neuronal phenotyping in zebrafish.

PubMed

Liu, Tianming; Lu, Jianfeng; Wang, Ye; Campbell, William A; Huang, Ling; Zhu, Jinmin; Xia, Weiming; Wong, Stephen T C

2006-06-15

An integrated microscope image analysis pipeline is developed for automatic analysis and quantification of phenotypes in zebrafish with altered expression of Alzheimer's disease (AD)-linked genes. We hypothesize that a slight impairment of neuronal integrity in a large number of zebrafish carrying the mutant genotype can be detected through the computerized image analysis method. Key functionalities of our zebrafish image processing pipeline include quantification of neuron loss in zebrafish embryos due to knockdown of AD-linked genes, automatic detection of defective somites, and quantitative measurement of gene expression levels in zebrafish with altered expression of AD-linked genes or treatment with a chemical compound. These quantitative measurements enable the archival of analyzed results and relevant meta-data. The structured database is organized for statistical analysis and data modeling to better understand neuronal integrity and phenotypic changes of zebrafish under different perturbations. Our results show that the computerized analysis is comparable to manual counting with equivalent accuracy and improved efficacy and consistency. Development of such an automated data analysis pipeline represents a significant step forward to achieve accurate and reproducible quantification of neuronal phenotypes in large scale or high-throughput zebrafish imaging studies.
Content Metadata Standards for Marine Science: A Case Study

USGS Publications Warehouse

Riall, Rebecca L.; Marincioni, Fausto; Lightsom, Frances L.

2004-01-01

The U.S. Geological Survey developed a content metadata standard to meet the demands of organizing electronic resources in the marine sciences for a broad, heterogeneous audience. These metadata standards are used by the Marine Realms Information Bank project, a Web-based public distributed library of marine science from academic institutions and government agencies. The development and deployment of this metadata standard serve as a model, complete with lessons about mistakes, for the creation of similarly specialized metadata standards for digital libraries.
Concurrent array-based queue

DOEpatents

Heidelberger, Philip; Steinmacher-Burow, Burkhard

2015-01-06

According to one embodiment, a method for implementing an array-based queue in memory of a memory system that includes a controller includes configuring, in the memory, metadata of the array-based queue. The configuring comprises defining, in metadata, an array start location in the memory for the array-based queue, defining, in the metadata, an array size for the array-based queue, defining, in the metadata, a queue top for the array-based queue and defining, in the metadata, a queue bottom for the array-based queue. The method also includes the controller serving a request for an operation on the queue, the request providing the location in the memory of the metadata of the queue.
WGISS-45 International Directory Network (IDN) Report

NASA Technical Reports Server (NTRS)

Morahan, Michael

2018-01-01

The objective of this presentation is to provide IDN (International Directory Network) updates on features and activities to the Committee on Earth Observation Satellites (CEOS) Working Group on Information Systems and Services (WGISS) and provider community. The following topics will be will be discussed during the presentation: Transition of Providers DIF-9 (Directory Interchange Format-9) to DIF-10 Metadata Records in the Common Metadata Repository (CMR); GCMD (Global Change Master Directory) Keyword Update; DIF-10 and UMM-C (Unified Metadata Model-Collections) Schema Changes; Metadata Validation of Provider Metadata; docBUILDER for Submitting IDN Metadata to the CMR (i.e. Registration); and Mapping WGClimate Essential Climate Variable (ECV) Inventory to IDN Records.

WebEAV: automatic metadata-driven generation of web interfaces to entity-attribute-value databases.

PubMed

Nadkarni, P M; Brandt, C M; Marenco, L

2000-01-01

The task of creating and maintaining a front end to a large institutional entity-attribute-value (EAV) database can be cumbersome when using traditional client-server technology. Switching to Web technology as a delivery vehicle solves some of these problems but introduces others. In particular, Web development environments tend to be primitive, and many features that client-server developers take for granted are missing. WebEAV is a generic framework for Web development that is intended to streamline the process of Web application development for databases having a significant EAV component. It also addresses some challenging user interface issues that arise when any complex system is created. The authors describe the architecture of WebEAV and provide an overview of its features with suitable examples.
Exploring Characterizations of Learning Object Repositories Using Data Mining Techniques

NASA Astrophysics Data System (ADS)

Segura, Alejandra; Vidal, Christian; Menendez, Victor; Zapata, Alfredo; Prieto, Manuel

Learning object repositories provide a platform for the sharing of Web-based educational resources. As these repositories evolve independently, it is difficult for users to have a clear picture of the kind of contents they give access to. Metadata can be used to automatically extract a characterization of these resources by using machine learning techniques. This paper presents an exploratory study carried out in the contents of four public repositories that uses clustering and association rule mining algorithms to extract characterizations of repository contents. The results of the analysis include potential relationships between different attributes of learning objects that may be useful to gain an understanding of the kind of resources available and eventually develop search mechanisms that consider repository descriptions as a criteria in federated search.
Explorative Analyses of Nursing Research Data.

PubMed

Kim, Hyeoneui; Jang, Imho; Quach, Jimmy; Richardson, Alex; Kim, Jaemin; Choi, Jeeyae

2016-10-26

As a first step of pursuing the vision of "big data science in nursing," we described the characteristics of nursing research data reported in 194 published nursing studies. We also explored how completely the Version 1 metadata specification of biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) represents these metadata. The metadata items of the nursing studies were all related to one or more of the bioCADDIE metadata entities. However, values of many metadata items of the nursing studies were not sufficiently represented through the bioCADDIE metadata. This was partly due to the differences in the scope of the content that the bioCADDIE metadata are designed to represent. The 194 nursing studies reported a total of 1,181 unique data items, the majority of which take non-numeric values. This indicates the importance of data standardization to enable the integrative analyses of these data to support big data science in nursing. © The Author(s) 2016.
A Model for Enhancing Internet Medical Document Retrieval with “Medical Core Metadata”

PubMed Central

Malet, Gary; Munoz, Felix; Appleyard, Richard; Hersh, William

1999-01-01

Objective: Finding documents on the World Wide Web relevant to a specific medical information need can be difficult. The goal of this work is to define a set of document content description tags, or metadata encodings, that can be used to promote disciplined search access to Internet medical documents. Design: The authors based their approach on a proposed metadata standard, the Dublin Core Metadata Element Set, which has recently been submitted to the Internet Engineering Task Force. Their model also incorporates the National Library of Medicine's Medical Subject Headings (MeSH) vocabulary and Medline-type content descriptions. Results: The model defines a medical core metadata set that can be used to describe the metadata for a wide variety of Internet documents. Conclusions: The authors propose that their medical core metadata set be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. PMID:10094069
MPEG-7: standard metadata for multimedia content

NASA Astrophysics Data System (ADS)

Chang, Wo

2005-08-01

The eXtensible Markup Language (XML) metadata technology of describing media contents has emerged as a dominant mode of making media searchable both for human and machine consumptions. To realize this premise, many online Web applications are pushing this concept to its fullest potential. However, a good metadata model does require a robust standardization effort so that the metadata content and its structure can reach its maximum usage between various applications. An effective media content description technology should also use standard metadata structures especially when dealing with various multimedia contents. A new metadata technology called MPEG-7 content description has merged from the ISO MPEG standards body with the charter of defining standard metadata to describe audiovisual content. This paper will give an overview of MPEG-7 technology and what impact it can bring forth to the next generation of multimedia indexing and retrieval applications.
Quality Assurance for Digital Learning Object Repositories: Issues for the Metadata Creation Process

ERIC Educational Resources Information Center

Currier, Sarah; Barton, Jane; O'Beirne, Ronan; Ryan, Ben

2004-01-01

Metadata enables users to find the resources they require, therefore it is an important component of any digital learning object repository. Much work has already been done within the learning technology community to assure metadata quality, focused on the development of metadata standards, specifications and vocabularies and their implementation…
A Model for the Creation of Human-Generated Metadata within Communities

ERIC Educational Resources Information Center

Brasher, Andrew; McAndrew, Patrick

2005-01-01

This paper considers situations for which detailed metadata descriptions of learning resources are necessary, and focuses on human generation of such metadata. It describes a model which facilitates human production of good quality metadata by the development and use of structured vocabularies. Using examples, this model is applied to single and…
Enhancing SCORM Metadata for Assessment Authoring in E-Learning

ERIC Educational Resources Information Center

Chang, Wen-Chih; Hsu, Hui-Huang; Smith, Timothy K.; Wang, Chun-Chia

2004-01-01

With the rapid development of distance learning and the XML technology, metadata play an important role in e-Learning. Nowadays, many distance learning standards, such as SCORM, AICC CMI, IEEE LTSC LOM and IMS, use metadata to tag learning materials. However, most metadata models are used to define learning materials and test problems. Few…
Development of Health Information Search Engine Based on Metadata and Ontology

PubMed Central

Song, Tae-Min; Jin, Dal-Lae

2014-01-01

Objectives The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Methods Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. Results A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Conclusions Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers. PMID:24872907
Development of health information search engine based on metadata and ontology.

PubMed

Song, Tae-Min; Park, Hyeoun-Ae; Jin, Dal-Lae

2014-04-01

The aim of the study was to develop a metadata and ontology-based health information search engine ensuring semantic interoperability to collect and provide health information using different application programs. Health information metadata ontology was developed using a distributed semantic Web content publishing model based on vocabularies used to index the contents generated by the information producers as well as those used to search the contents by the users. Vocabulary for health information ontology was mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), and a list of about 1,500 terms was proposed. The metadata schema used in this study was developed by adding an element describing the target audience to the Dublin Core Metadata Element Set. A metadata schema and an ontology ensuring interoperability of health information available on the internet were developed. The metadata and ontology-based health information search engine developed in this study produced a better search result compared to existing search engines. Health information search engine based on metadata and ontology will provide reliable health information to both information producer and information consumers.
Metadata in the Wild: An Empirical Survey of OPeNDAP-accessible Metadata and its Implications for Discovery

NASA Astrophysics Data System (ADS)

Hardy, D.; Janée, G.; Gallagher, J.; Frew, J.; Cornillon, P.

2006-12-01

The OPeNDAP Data Access Protocol (DAP) is a community standard for sharing scientific data across the Internet. Data providers using DAP have adopted a variety of metadata conventions to improve data utility, such as COARDS (1995) and CF (2003). Our results show, however, that metadata do not follow these conventions in practice. We collected metadata from over a hundred DAP servers, tens of thousands of data objects, and hundreds of collections. We found that a minority claim to adhere to a metadata convention, and a small percentage accurately adhere to their stated convention. We present descriptive statistics of our survey and highlight common traits such as well-populated attributes. Our empirical results indicate that unified search services cannot rely solely on metadata conventions. Although we encourage all providers to adopt a small subset of the CF convention for discovery purposes, we have no evidence to suggest that improved conventions would simplify the fundamental problem of heterogeneity. Large-scale discovery services must find methods for integrating incompatible metadata.
AstroGrid-D: Grid technology for astronomical science

NASA Astrophysics Data System (ADS)

Enke, Harry; Steinmetz, Matthias; Adorf, Hans-Martin; Beck-Ratzka, Alexander; Breitling, Frank; Brüsemeister, Thomas; Carlson, Arthur; Ensslin, Torsten; Högqvist, Mikael; Nickelt, Iliya; Radke, Thomas; Reinefeld, Alexander; Reiser, Angelika; Scholl, Tobias; Spurzem, Rainer; Steinacker, Jürgen; Voges, Wolfgang; Wambsganß, Joachim; White, Steve

2011-02-01

We present status and results of AstroGrid-D, a joint effort of astrophysicists and computer scientists to employ grid technology for scientific applications. AstroGrid-D provides access to a network of distributed machines with a set of commands as well as software interfaces. It allows simple use of computer and storage facilities and to schedule or monitor compute tasks and data management. It is based on the Globus Toolkit middleware (GT4). Chapter 1 describes the context which led to the demand for advanced software solutions in Astrophysics, and we state the goals of the project. We then present characteristic astrophysical applications that have been implemented on AstroGrid-D in chapter 2. We describe simulations of different complexity, compute-intensive calculations running on multiple sites (Section 2.1), and advanced applications for specific scientific purposes (Section 2.2), such as a connection to robotic telescopes (Section 2.2.3). We can show from these examples how grid execution improves e.g. the scientific workflow. Chapter 3 explains the software tools and services that we adapted or newly developed. Section 3.1 is focused on the administrative aspects of the infrastructure, to manage users and monitor activity. Section 3.2 characterises the central components of our architecture: The AstroGrid-D information service to collect and store metadata, a file management system, the data management system, and a job manager for automatic submission of compute tasks. We summarise the successfully established infrastructure in chapter 4, concluding with our future plans to establish AstroGrid-D as a platform of modern e-Astronomy.
Posteriori error determination and grid adaptation for AMR and ALE computational fluid dynamics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lapenta, G. M.

2002-01-01

We discuss grid adaptation for application to AMR and ALE codes. Two new contributions are presented. First, a new method to locate the regions where the truncation error is being created due to an insufficient accuracy: the operator recovery error origin (OREO) detector. The OREO detector is automatic, reliable, easy to implement and extremely inexpensive. Second, a new grid motion technique is presented for application to ALE codes. The method is based on the Brackbill-Saltzman approach but it is directly linked to the OREO detector and moves the grid automatically to minimize the error.
The design of digital-adaptive controllers for VTOL aircraft

NASA Technical Reports Server (NTRS)

Stengel, R. F.; Broussard, J. R.; Berry, P. W.

1976-01-01

Design procedures for VTOL automatic control systems have been developed and are presented. Using linear-optimal estimation and control techniques as a starting point, digital-adaptive control laws have been designed for the VALT Research Aircraft, a tandem-rotor helicopter which is equipped for fully automatic flight in terminal area operations. These control laws are designed to interface with velocity-command and attitude-command guidance logic, which could be used in short-haul VTOL operations. Developments reported here include new algorithms for designing non-zero-set-point digital regulators, design procedures for rate-limited systems, and algorithms for dynamic control trim setting.
A metadata template for ocean acidification data

NASA Astrophysics Data System (ADS)

Jiang, L.

2014-12-01

Metadata is structured information that describes, explains, and locates an information resource (e.g., data). It is often coarsely described as data about data, and documents information such as what was measured, by whom, when, where, and how it was sampled, analyzed, with what instruments. Metadata is inherent to ensure the survivability and accessibility of the data into the future. With the rapid expansion of biological response ocean acidification (OA) studies, the lack of a common metadata template to document such type of data has become a significant gap for ocean acidification data management efforts. In this paper, we present a metadata template that can be applied to a broad spectrum of OA studies, including those studying the biological responses of organisms on ocean acidification. The "variable metadata section", which includes the variable name, observation type, whether the variable is a manipulation condition or response variable, and the biological subject on which the variable is studied, forms the core of this metadata template. Additional metadata elements, such as principal investigators, temporal and spatial coverage, platforms for the sampling, data citation are essential components to complete the template. We explain the structure of the template, and define many metadata elements that may be unfamiliar to researchers. For that reason, this paper can serve as a user's manual for the template.
A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

NASA Astrophysics Data System (ADS)

Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

2013-12-01

The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
Metadata Design in the New PDS4 Standards - Something for Everybody

NASA Astrophysics Data System (ADS)

Raugh, Anne C.; Hughes, John S.

2015-11-01

The Planetary Data System (PDS) archives, supports, and distributes data of diverse targets, from diverse sources, to diverse users. One of the core problems addressed by the PDS4 data standard redesign was that of metadata - how to accommodate the increasingly sophisticated demands of search interfaces, analytical software, and observational documentation into label standards without imposing limits and constraints that would impinge on the quality or quantity of metadata that any particular observer or team could supply. And yet, as an archive, PDS must have detailed documentation for the metadata in the labels it supports, or the institutional knowledge encoded into those attributes will be lost - putting the data at risk.The PDS4 metadata solution is based on a three-step approach. First, it is built on two key ISO standards: ISO 11179 "Information Technology - Metadata Registries", which provides a common framework and vocabulary for defining metadata attributes; and ISO 14721 "Space Data and Information Transfer Systems - Open Archival Information System (OAIS) Reference Model", which provides the framework for the information architecture that enforces the object-oriented paradigm for metadata modeling. Second, PDS has defined a hierarchical system that allows it to divide its metadata universe into namespaces ("data dictionaries", conceptually), and more importantly to delegate stewardship for a single namespace to a local authority. This means that a mission can develop its own data model with a high degree of autonomy and effectively extend the PDS model to accommodate its own metadata needs within the common ISO 11179 framework. Finally, within a single namespace - even the core PDS namespace - existing metadata structures can be extended and new structures added to the model as new needs are identifiedThis poster illustrates the PDS4 approach to metadata management and highlights the expected return on the development investment for PDS, users and data preparers.
A SensorML-based Metadata Model and Registry for Ocean Observatories: a Contribution from European Projects NeXOS and FixO3

NASA Astrophysics Data System (ADS)

Delory, E.; Jirka, S.

2016-02-01

Discovering sensors and observation data is important when enabling the exchange of oceanographic data between observatories and scientists that need the data sets for their work. To better support this discovery process, one task of the European project FixO3 (Fixed-point Open Ocean Observatories) is dealing with the question which elements are needed for developing a better registry for sensors. This has resulted in four items which are addressed by the FixO3 project in cooperation with further European projects such as NeXOS (http://www.nexosproject.eu/). 1.) Metadata description format: To store and retrieve information about sensors and platforms it is necessary to have a common approach how to provide and encode the metadata. For this purpose, the OGC Sensor Model Language (SensorML) 2.0 standard was selected. Especially the opportunity to distinguish between sensor types and instances offers new chances for a more efficient provision and maintenance of sensor metadata. 2.) Conversion of existing metadata into a SensorML 2.0 representation: In order to ensure a sustainable re-use of already provided metadata content (e.g. from ESONET-FixO3 yellow pages), it is important to provide a mechanism which is capable of transforming these already available metadata sets into the new SensorML 2.0 structure. 3.) Metadata editor: To create descriptions of sensors and platforms, it is not possible to expect users to manually edit XML-based description files. Thus, a visual interface is necessary to help during the metadata creation. We will outline a prototype of this editor, building upon the development of the ESONET sensor registry interface. 4.) Sensor Metadata Store: A server is needed that for storing and querying the created sensor descriptions. For this purpose different options exist which will be discussed. In summary, we will present a set of different elements enabling sensor discovery ranging from metadata formats, metadata conversion and editing to metadata storage. Furthermore, the current development status will be demonstrated.
HDF-EOS Dump Tools

NASA Astrophysics Data System (ADS)

Prasad, U.; Rahabi, A.

2001-05-01

The following utilities developed for HDF-EOS format data dump are of special use for Earth science data for NASA's Earth Observation System (EOS). This poster demonstrates their use and application. The first four tools take HDF-EOS data files as input. HDF-EOS Metadata Dumper - metadmp Metadata dumper extracts metadata from EOS data granules. It operates by simply copying blocks of metadata from the file to the standard output. It does not process the metadata in any way. Since all metadata in EOS granules is encoded in the Object Description Language (ODL), the output of metadmp will be in the form of complete ODL statements. EOS data granules may contain up to three different sets of metadata (Core, Archive, and Structural Metadata). HDF-EOS Contents Dumper - heosls Heosls dumper displays the contents of HDF-EOS files. This utility provides detailed information on the POINT, SWATH, and GRID data sets. in the files. For example: it will list, the Geo-location fields, Data fields and objects. HDF-EOS ASCII Dumper - asciidmp The ASCII dump utility extracts fields from EOS data granules into plain ASCII text. The output from asciidmp should be easily human readable. With minor editing, asciidmp's output can be made ingestible by any application with ASCII import capabilities. HDF-EOS Binary Dumper - bindmp The binary dumper utility dumps HDF-EOS objects in binary format. This is useful for feeding the output of it into existing program, which does not understand HDF, for example: custom software and COTS products. HDF-EOS User Friendly Metadata - UFM The UFM utility tool is useful for viewing ECS metadata. UFM takes an EOSDIS ODL metadata file and produces an HTML report of the metadata for display using a web browser. HDF-EOS METCHECK - METCHECK METCHECK can be invoked from either Unix or Dos environment with a set of command line options that a user might use to direct the tool inputs and output . METCHECK validates the inventory metadata in (.met file) using The Descriptor file (.desc) as the reference. The tool takes (.desc), and (.met) an ODL file as inputs, and generates a simple output file contains the results of the checking process.
Evolving Metadata in NASA Earth Science Data Systems

NASA Astrophysics Data System (ADS)

Mitchell, A.; Cechini, M. F.; Walter, J.

2011-12-01

NASA's Earth Observing System (EOS) is a coordinated series of satellites for long term global observations. NASA's Earth Observing System Data and Information System (EOSDIS) is a petabyte-scale archive of environmental data that supports global climate change research by providing end-to-end services from EOS instrument data collection to science data processing to full access to EOS and other earth science data. On a daily basis, the EOSDIS ingests, processes, archives and distributes over 3 terabytes of data from NASA's Earth Science missions representing over 3500 data products ranging from various types of science disciplines. EOSDIS is currently comprised of 12 discipline specific data centers that are collocated with centers of science discipline expertise. Metadata is used in all aspects of NASA's Earth Science data lifecycle from the initial measurement gathering to the accessing of data products. Missions use metadata in their science data products when describing information such as the instrument/sensor, operational plan, and geographically region. Acting as the curator of the data products, data centers employ metadata for preservation, access and manipulation of data. EOSDIS provides a centralized metadata repository called the Earth Observing System (EOS) ClearingHouse (ECHO) for data discovery and access via a service-oriented-architecture (SOA) between data centers and science data users. ECHO receives inventory metadata from data centers who generate metadata files that complies with the ECHO Metadata Model. NASA's Earth Science Data and Information System (ESDIS) Project established a Tiger Team to study and make recommendations regarding the adoption of the international metadata standard ISO 19115 in EOSDIS. The result was a technical report recommending an evolution of NASA data systems towards a consistent application of ISO 19115 and related standards including the creation of a NASA-specific convention for core ISO 19115 elements. Part of NASA's effort to continually evolve its data systems led ECHO to enhancing the method in which it receives inventory metadata from the data centers to allow for multiple metadata formats including ISO 19115. ECHO's metadata model will also be mapped to the NASA-specific convention for ingesting science metadata into the ECHO system. As NASA's new Earth Science missions and data centers are migrating to the ISO 19115 standards, EOSDIS is developing metadata management resources to assist in the reading, writing and parsing ISO 19115 compliant metadata. To foster interoperability with other agencies and international partners, NASA is working to ensure that a common ISO 19115 convention is developed, enhancing data sharing capabilities and other data analysis initiatives. NASA is also investigating the use of ISO 19115 standards to encode data quality, lineage and provenance with stored values. A common metadata standard across NASA's Earth Science data systems promotes interoperability, enhances data utilization and removes levels of uncertainty found in data products.

Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-Scale Scientific Collaboration

NASA Astrophysics Data System (ADS)

Okaya, D.; Deelman, E.; Maechling, P.; Wong-Barnum, M.; Jordan, T. H.; Meyers, D.

2007-12-01

Large scientific collaborations, such as the SCEC Petascale Cyberfacility for Physics-based Seismic Hazard Analysis (PetaSHA) Project, involve interactions between many scientists who exchange ideas and research results. These groups must organize, manage, and make accessible their community materials of observational data, derivative (research) results, computational products, and community software. The integration of scientific workflows as a paradigm to solve complex computations provides advantages of efficiency, reliability, repeatability, choices, and ease of use. The underlying resource needed for a scientific workflow to function and create discoverable and exchangeable products is the construction, tracking, and preservation of metadata. In the scientific workflow environment there is a two-tier structure of metadata. Workflow-level metadata and provenance describe operational steps, identity of resources, execution status, and product locations and names. Domain-level metadata essentially define the scientific meaning of data, codes and products. To a large degree the metadata at these two levels are separate. However, between these two levels is a subset of metadata produced at one level but is needed by the other. This crossover metadata suggests that some commonality in metadata handling is needed. SCEC researchers are collaborating with computer scientists at SDSC, the USC Information Sciences Institute, and Carnegie Mellon Univ. in order to perform earthquake science using high-performance computational resources. A primary objective of the "PetaSHA" collaboration is to perform physics-based estimations of strong ground motion associated with real and hypothetical earthquakes located within Southern California. Construction of 3D earth models, earthquake representations, and numerical simulation of seismic waves are key components of these estimations. Scientific workflows are used to orchestrate the sequences of scientific tasks and to access distributed computational facilities such as the NSF TeraGrid. Different types of metadata are produced and captured within the scientific workflows. One workflow within PetaSHA ("Earthworks") performs a linear sequence of tasks with workflow and seismological metadata preserved. Downstream scientific codes ingest these metadata produced by upstream codes. The seismological metadata uses attribute-value pairing in plain text; an identified need is to use more advanced handling methods. Another workflow system within PetaSHA ("Cybershake") involves several complex workflows in order to perform statistical analysis of ground shaking due to thousands of hypothetical but plausible earthquakes. Metadata management has been challenging due to its construction around a number of legacy scientific codes. We describe difficulties arising in the scientific workflow due to the lack of this metadata and suggest corrective steps, which in some cases include the cultural shift of domain science programmers coding for metadata.
Evaluating the privacy properties of telephone metadata.

PubMed

Mayer, Jonathan; Mutchler, Patrick; Mitchell, John C

2016-05-17

Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences.
Studies of Big Data metadata segmentation between relational and non-relational databases

NASA Astrophysics Data System (ADS)

Golosova, M. V.; Grigorieva, M. A.; Klimentov, A. A.; Ryabinkin, E. A.; Dimitrov, G.; Potekhin, M.

2015-12-01

In recent years the concepts of Big Data became well established in IT. Systems managing large data volumes produce metadata that describe data and workflows. These metadata are used to obtain information about current system state and for statistical and trend analysis of the processes these systems drive. Over the time the amount of the stored metadata can grow dramatically. In this article we present our studies to demonstrate how metadata storage scalability and performance can be improved by using hybrid RDBMS/NoSQL architecture.
Evaluating the privacy properties of telephone metadata

PubMed Central

Mayer, Jonathan; Mutchler, Patrick; Mitchell, John C.

2016-01-01

Since 2013, a stream of disclosures has prompted reconsideration of surveillance law and policy. One of the most controversial principles, both in the United States and abroad, is that communications metadata receives substantially less protection than communications content. Several nations currently collect telephone metadata in bulk, including on their own citizens. In this paper, we attempt to shed light on the privacy properties of telephone metadata. Using a crowdsourcing methodology, we demonstrate that telephone metadata is densely interconnected, can trivially be reidentified, and can be used to draw sensitive inferences. PMID:27185922
Incorporating ISO Metadata Using HDF Product Designer

NASA Technical Reports Server (NTRS)

Jelenak, Aleksandar; Kozimor, John; Habermann, Ted

2016-01-01

The need to store in HDF5 files increasing amounts of metadata of various complexity is greatly overcoming the capabilities of the Earth science metadata conventions currently in use. Data producers until now did not have much choice but to come up with ad hoc solutions to this challenge. Such solutions, in turn, pose a wide range of issues for data managers, distributors, and, ultimately, data users. The HDF Group is experimenting on a novel approach of using ISO 19115 metadata objects as a catch-all container for all the metadata that cannot be fitted into the current Earth science data conventions. This presentation will showcase how the HDF Product Designer software can be utilized to help data producers include various ISO metadata objects in their products.
Multiple-Diode-Laser Gas-Detection Spectrometer

NASA Technical Reports Server (NTRS)

Webster, Christopher R.; Beer, Reinhard; Sander, Stanley P.

1988-01-01

Small concentrations of selected gases measured automatically. Proposed multiple-laser-diode spectrometer part of system for measuring automatically concentrations of selected gases at part-per-billion level. Array of laser/photodetector pairs measure infrared absorption spectrum of atmosphere along probing laser beams. Adaptable to terrestrial uses as monitoring pollution or control of industrial processes.
Automatic Data Processing, 4-1. Military Curriculum Materials for Vocational and Technical Education.

ERIC Educational Resources Information Center

Army Ordnance Center and School, Aberdeen Proving Ground, MD.

These two texts and student workbook for a secondary/postsecondary-level correspondence course in automatic data processing comprise one of a number of military-developed curriculum packages selected for adaptation to vocational instruction and curriculum development in a civilian setting. The purpose stated for the individualized, self-paced…
Speaker-Machine Interaction in Automatic Speech Recognition. Technical Report.

ERIC Educational Resources Information Center

Makhoul, John I.

The feasibility and limitations of speaker adaptation in improving the performance of a "fixed" (speaker-independent) automatic speech recognition system were examined. A fixed vocabulary of 55 syllables is used in the recognition system which contains 11 stops and fricatives and five tense vowels. The results of an experiment on speaker…
The Automatic Sweetheart: An Assignment in a History of Psychology Course

ERIC Educational Resources Information Center

Sibicky, Mark E.

2007-01-01

This article describes an assignment in a History of Psychology course used to enhance student retention of material and increase student interest and discussion of the long-standing debate between humanistic and mechanistic models in psychology. Adapted from William James's (1955) automatic sweetheart question, the assignment asks students to…
Text Structuration Leading to an Automatic Summary System: RAFI.

ERIC Educational Resources Information Center

Lehman, Abderrafih

1999-01-01

Describes the design and construction of Resume Automatique a Fragments Indicateurs (RAFI), a system of automatic text summary which sums up scientific and technical texts. The RAFI system transforms a long source text into several versions of more condensed texts, using discourse analysis, to make searching easier; it could be adapted to the…
Viewing and Editing Earth Science Metadata MOBE: Metadata Object Browser and Editor in Java

NASA Astrophysics Data System (ADS)

Chase, A.; Helly, J.

2002-12-01

Metadata is an important, yet often neglected aspect of successful archival efforts. However, to generate robust, useful metadata is often a time consuming and tedious task. We have been approaching this problem from two directions: first by automating metadata creation, pulling from known sources of data, and in addition, what this (paper/poster?) details, developing friendly software for human interaction with the metadata. MOBE and COBE(Metadata Object Browser and Editor, and Canonical Object Browser and Editor respectively), are Java applications for editing and viewing metadata and digital objects. MOBE has already been designed and deployed, currently being integrated into other areas of the SIOExplorer project. COBE is in the design and development stage, being created with the same considerations in mind as those for MOBE. Metadata creation, viewing, data object creation, and data object viewing, when taken on a small scale are all relatively simple tasks. Computer science however, has an infamous reputation for transforming the simple into complex. As a system scales upwards to become more robust, new features arise and additional functionality is added to the software being written to manage the system. The software that emerges from such an evolution, though powerful, is often complex and difficult to use. With MOBE the focus is on a tool that does a small number of tasks very well. The result has been an application that enables users to manipulate metadata in an intuitive and effective way. This allows for a tool that serves its purpose without introducing additional cognitive load onto the user, an end goal we continue to pursue.
The Metadata Cloud: The Last Piece of a Distributed Data System Model

NASA Astrophysics Data System (ADS)

King, T. A.; Cecconi, B.; Hughes, J. S.; Walker, R. J.; Roberts, D.; Thieman, J. R.; Joy, S. P.; Mafi, J. N.; Gangloff, M.

2012-12-01

Distributed data systems have existed ever since systems were networked together. Over the years the model for distributed data systems have evolved from basic file transfer to client-server to multi-tiered to grid and finally to cloud based systems. Initially metadata was tightly coupled to the data either by embedding the metadata in the same file containing the data or by co-locating the metadata in commonly named files. As the sources of data multiplied, data volumes have increased and services have specialized to improve efficiency; a cloud system model has emerged. In a cloud system computing and storage are provided as services with accessibility emphasized over physical location. Computation and data clouds are common implementations. Effectively using the data and computation capabilities requires metadata. When metadata is stored separately from the data; a metadata cloud is formed. With a metadata cloud information and knowledge about data resources can migrate efficiently from system to system, enabling services and allowing the data to remain efficiently stored until used. This is especially important with "Big Data" where movement of the data is limited by bandwidth. We examine how the metadata cloud completes a general distributed data system model, how standards play a role and relate this to the existing types of cloud computing. We also look at the major science data systems in existence and compare each to the generalized cloud system model.
Turning Data into Information: Assessing and Reporting GIS Metadata Integrity Using Integrated Computing Technologies

ERIC Educational Resources Information Center

Mulrooney, Timothy J.

2009-01-01

A Geographic Information System (GIS) serves as the tangible and intangible means by which spatially related phenomena can be created, analyzed and rendered. GIS metadata serves as the formal framework to catalog information about a GIS data set. Metadata is independent of the encoded spatial and attribute information. GIS metadata is a subset of…
Integrating XQuery-Enabled SCORM XML Metadata Repositories into an RDF-Based E-Learning P2P Network

ERIC Educational Resources Information Center

Qu, Changtao; Nejdl, Wolfgang

2004-01-01

Edutella is an RDF-based E-Learning P2P network that is aimed to accommodate heterogeneous learning resource metadata repositories in a P2P manner and further facilitate the exchange of metadata between these repositories based on RDF. Whereas Edutella provides RDF metadata repositories with a quite natural integration approach, XML metadata…
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.

PubMed

Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin

2018-02-01

More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluation of three automatic oxygen therapy control algorithms on ventilated low birth weight neonates.

PubMed

Morozoff, Edmund P; Smyth, John A

2009-01-01

Neonates with under developed lungs often require oxygen therapy. During the course of oxygen therapy, elevated levels of blood oxygenation, hyperoxemia, must be avoided or the risk of chronic lung disease or retinal damage is increased. Low levels of blood oxygen, hypoxemia, may lead to permanent brain tissue damage and, in some cases, mortality. A closed loop controller that automatically administers oxygen therapy using 3 algorithms - state machine, adaptive model, and proportional integral derivative (PID) - is applied to 7 ventilated low birth weight neonates and compared to manual oxygen therapy. All 3 automatic control algorithms demonstrated their ability to improve manual oxygen therapy by increasing periods of normoxemia and reducing the need for manual FiO(2) adjustments. Of the three control algorithms, the adaptive model showed the best performance with 0.25 manual adjustments per hour and 73% time spent within target +/- 3% SpO(2).
Science friction: data, metadata, and collaboration.

PubMed

Edwards, Paul N; Mayernik, Matthew S; Batcheller, Archer L; Bowker, Geoffrey C; Borgman, Christine L

2011-10-01

When scientists from two or more disciplines work together on related problems, they often face what we call 'science friction'. As science becomes more data-driven, collaborative, and interdisciplinary, demand increases for interoperability among data, tools, and services. Metadata--usually viewed simply as 'data about data', describing objects such as books, journal articles, or datasets--serve key roles in interoperability. Yet we find that metadata may be a source of friction between scientific collaborators, impeding data sharing. We propose an alternative view of metadata, focusing on its role in an ephemeral process of scientific communication, rather than as an enduring outcome or product. We report examples of highly useful, yet ad hoc, incomplete, loosely structured, and mutable, descriptions of data found in our ethnographic studies of several large projects in the environmental sciences. Based on this evidence, we argue that while metadata products can be powerful resources, usually they must be supplemented with metadata processes. Metadata-as-process suggests the very large role of the ad hoc, the incomplete, and the unfinished in everyday scientific work.
Recipes for Semantic Web Dog Food — The ESWC and ISWC Metadata Projects

NASA Astrophysics Data System (ADS)

Möller, Knud; Heath, Tom; Handschuh, Siegfried; Domingue, John

Semantic Web conferences such as ESWC and ISWC offer prime opportunities to test and showcase semantic technologies. Conference metadata about people, papers and talks is diverse in nature and neither too small to be uninteresting or too big to be unmanageable. Many metadata-related challenges that may arise in the Semantic Web at large are also present here. Metadata must be generated from sources which are often unstructured and hard to process, and may originate from many different players, therefore suitable workflows must be established. Moreover, the generated metadata must use appropriate formats and vocabularies, and be served in a way that is consistent with the principles of linked data. This paper reports on the metadata efforts from ESWC and ISWC, identifies specific issues and barriers encountered during the projects, and discusses how these were approached. Recommendations are made as to how these may be addressed in the future, and we discuss how these solutions may generalize to metadata production for the Semantic Web at large.
Automatic segmentation of right ventricle on ultrasound images using sparse matrix transform and level set

NASA Astrophysics Data System (ADS)

Qin, Xulei; Cong, Zhibin; Halig, Luma V.; Fei, Baowei

2013-03-01

An automatic framework is proposed to segment right ventricle on ultrasound images. This method can automatically segment both epicardial and endocardial boundaries from a continuous echocardiography series by combining sparse matrix transform (SMT), a training model, and a localized region based level set. First, the sparse matrix transform extracts main motion regions of myocardium as eigenimages by analyzing statistical information of these images. Second, a training model of right ventricle is registered to the extracted eigenimages in order to automatically detect the main location of the right ventricle and the corresponding transform relationship between the training model and the SMT-extracted results in the series. Third, the training model is then adjusted as an adapted initialization for the segmentation of each image in the series. Finally, based on the adapted initializations, a localized region based level set algorithm is applied to segment both epicardial and endocardial boundaries of the right ventricle from the whole series. Experimental results from real subject data validated the performance of the proposed framework in segmenting right ventricle from echocardiography. The mean Dice scores for both epicardial and endocardial boundaries are 89.1%+/-2.3% and 83.6+/-7.3%, respectively. The automatic segmentation method based on sparse matrix transform and level set can provide a useful tool for quantitative cardiac imaging.
BASINS Metadata

EPA Pesticide Factsheets

Metadata or data about data describes the content, quality, condition, and other characteristics of data. Geospatial metadata are critical to data discovery and serves as the fuel for the Geospatial One-Stop data portal.

Visualization of JPEG Metadata

NASA Astrophysics Data System (ADS)

Malik Mohamad, Kamaruddin; Deris, Mustafa Mat

There are a lot of information embedded in JPEG image than just graphics. Visualization of its metadata would benefit digital forensic investigator to view embedded data including corrupted image where no graphics can be displayed in order to assist in evidence collection for cases such as child pornography or steganography. There are already available tools such as metadata readers, editors and extraction tools but mostly focusing on visualizing attribute information of JPEG Exif. However, none have been done to visualize metadata by consolidating markers summary, header structure, Huffman table and quantization table in a single program. In this paper, metadata visualization is done by developing a program that able to summarize all existing markers, header structure, Huffman table and quantization table in JPEG. The result shows that visualization of metadata helps viewing the hidden information within JPEG more easily.
Validation of automatic landmark identification for atlas-based segmentation for radiation treatment planning of the head-and-neck region

NASA Astrophysics Data System (ADS)

Leavens, Claudia; Vik, Torbjørn; Schulz, Heinrich; Allaire, Stéphane; Kim, John; Dawson, Laura; O'Sullivan, Brian; Breen, Stephen; Jaffray, David; Pekar, Vladimir

2008-03-01

Manual contouring of target volumes and organs at risk in radiation therapy is extremely time-consuming, in particular for treating the head-and-neck area, where a single patient treatment plan can take several hours to contour. As radiation treatment delivery moves towards adaptive treatment, the need for more efficient segmentation techniques will increase. We are developing a method for automatic model-based segmentation of the head and neck. This process can be broken down into three main steps: i) automatic landmark identification in the image dataset of interest, ii) automatic landmark-based initialization of deformable surface models to the patient image dataset, and iii) adaptation of the deformable models to the patient-specific anatomical boundaries of interest. In this paper, we focus on the validation of the first step of this method, quantifying the results of our automatic landmark identification method. We use an image atlas formed by applying thin-plate spline (TPS) interpolation to ten atlas datasets, using 27 manually identified landmarks in each atlas/training dataset. The principal variation modes returned by principal component analysis (PCA) of the landmark positions were used by an automatic registration algorithm, which sought the corresponding landmarks in the clinical dataset of interest using a controlled random search algorithm. Applying a run time of 60 seconds to the random search, a root mean square (rms) distance to the ground-truth landmark position of 9.5 +/- 0.6 mm was calculated for the identified landmarks. Automatic segmentation of the brain, mandible and brain stem, using the detected landmarks, is demonstrated.
Use of a metadata documentation and search tool for large data volumes: The NGEE arctic example

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Hook, Leslie A; Killeffer, Terri S

The Online Metadata Editor (OME) is a web-based tool to help document scientific data in a well-structured, popular scientific metadata format. In this paper, we will discuss the newest tool that Oak Ridge National Laboratory (ORNL) has developed to generate, edit, and manage metadata and how it is helping data-intensive science centers and projects, such as the U.S. Department of Energy s Next Generation Ecosystem Experiments (NGEE) in the Arctic to prepare metadata and make their big data produce big science and lead to new discoveries.
Measuring and tracking the flow of climate change adaptation aid to the developing world

NASA Astrophysics Data System (ADS)

Donner, Simon D.; Kandlikar, Milind; Webber, Sophie

2016-05-01

The developed world has pledged to mobilize at least US 100 billion per year of ‘new’ and ‘additional’ funds by 2020 to help the developing world respond to climate change. Tracking this finance is particularly problematic for climate change adaptation, as there is no clear definition of what separates adaptation aid from standard development aid. Here we use a historical database of overseas development assistance projects to test the effect of different accounting assumptions on the delivery of adaptation finance to the developing countries of Oceania, using machine algorithms developed from a manual pilot study. The results show that explicit adaptation finance grew to 3%-4% of all development aid to Oceania by the 2008-2012 period, but that total adaptation finance could be as high as 37% of all aid, depending on potentially politically motivated assumptions about what counts as adaptation. There was also an uneven distribution of adaptation aid between countries facing similar challenges like Kiribati, the Marshall Islands, and the Federated States of Micronesia. The analysis indicates that data allowing individual projects to be weighted by their climate change relevance is needed. A robust and mandatory metadata system for all aid projects would allow multilateral aid agencies and independent third parties to perform their own analyses using different assumptions and definitions, and serve as a key check on international climate aid promises.
Assisted editing od SensorML with EDI. A bottom-up scenario towards the definition of sensor profiles.

NASA Astrophysics Data System (ADS)

Oggioni, Alessandro; Tagliolato, Paolo; Fugazza, Cristiano; Bastianini, Mauro; Pavesi, Fabio; Pepe, Monica; Menegon, Stefano; Basoni, Anna; Carrara, Paola

2015-04-01

Sensor observation systems for environmental data have become increasingly important in the last years. The EGU's Informatics in Oceanography and Ocean Science track stressed the importance of management tools and solutions for marine infrastructures. We think that full interoperability among sensor systems is still an open issue and that the solution to this involves providing appropriate metadata. Several open source applications implement the SWE specification and, particularly, the Sensor Observation Services (SOS) standard. These applications allow for the exchange of data and metadata in XML format between computer systems. However, there is a lack of metadata editing tools supporting end users in this activity. Generally speaking, it is hard for users to provide sensor metadata in the SensorML format without dedicated tools. In particular, such a tool should ease metadata editing by providing, for standard sensors, all the invariant information to be included in sensor metadata, thus allowing the user to concentrate on the metadata items that are related to the specific deployment. RITMARE, the Italian flagship project on marine research, envisages a subproject, SP7, for the set-up of the project's spatial data infrastructure. SP7 developed EDI, a general purpose, template-driven metadata editor that is composed of a backend web service and an HTML5/javascript client. EDI can be customized for managing the creation of generic metadata encoded as XML. Once tailored to a specific metadata format, EDI presents the users a web form with advanced auto completion and validation capabilities. In the case of sensor metadata (SensorML versions 1.0.1 and 2.0), the EDI client is instructed to send an "insert sensor" request to an SOS endpoint in order to save the metadata in an SOS server. In the first phase of project RITMARE, EDI has been used to simplify the creation from scratch of SensorML metadata by the involved researchers and data managers. An interesting by-product of this ongoing work is currently constituting an archive of predefined sensor descriptions. This information is being collected in order to further ease metadata creation in the next phase of the project. Users will be able to choose among a number of sensor and sensor platform prototypes: These will be specific instances on which it will be possible to define, in a bottom-up approach, "sensor profiles". We report on the outcome of this activity.
Recognizing lexical and semantic change patterns in evolving life science ontologies to inform mapping adaptation.

PubMed

Dos Reis, Julio Cesar; Dinh, Duy; Da Silveira, Marcos; Pruski, Cédric; Reynaud-Delaître, Chantal

2015-03-01

Mappings established between life science ontologies require significant efforts to maintain them up to date due to the size and frequent evolution of these ontologies. In consequence, automatic methods for applying modifications on mappings are highly demanded. The accuracy of such methods relies on the available description about the evolution of ontologies, especially regarding concepts involved in mappings. However, from one ontology version to another, a further understanding of ontology changes relevant for supporting mapping adaptation is typically lacking. This research work defines a set of change patterns at the level of concept attributes, and proposes original methods to automatically recognize instances of these patterns based on the similarity between attributes denoting the evolving concepts. This investigation evaluates the benefits of the proposed methods and the influence of the recognized change patterns to select the strategies for mapping adaptation. The summary of the findings is as follows: (1) the Precision (>60%) and Recall (>35%) achieved by comparing manually identified change patterns with the automatic ones; (2) a set of potential impact of recognized change patterns on the way mappings is adapted. We found that the detected correlations cover ∼66% of the mapping adaptation actions with a positive impact; and (3) the influence of the similarity coefficient calculated between concept attributes on the performance of the recognition algorithms. The experimental evaluations conducted with real life science ontologies showed the effectiveness of our approach to accurately characterize ontology evolution at the level of concept attributes. This investigation confirmed the relevance of the proposed change patterns to support decisions on mapping adaptation. Copyright © 2014 Elsevier B.V. All rights reserved.
Publishing NASA Metadata as Linked Open Data for Semantic Mashups

NASA Astrophysics Data System (ADS)

Wilson, Brian; Manipon, Gerald; Hua, Hook

2014-05-01

Data providers are now publishing more metadata in more interoperable forms, e.g. Atom or RSS 'casts', as Linked Open Data (LOD), or as ISO Metadata records. A major effort on the part of the NASA's Earth Science Data and Information System (ESDIS) project is the aggregation of metadata that enables greater data interoperability among scientific data sets regardless of source or application. Both the Earth Observing System (EOS) ClearingHOuse (ECHO) and the Global Change Master Directory (GCMD) repositories contain metadata records for NASA (and other) datasets and provided services. These records contain typical fields for each dataset (or software service) such as the source, creation date, cognizant institution, related access URL's, and domain and variable keywords to enable discovery. Under a NASA ACCESS grant, we demonstrated how to publish the ECHO and GCMD dataset and services metadata as LOD in the RDF format. Both sets of metadata are now queryable at SPARQL endpoints and available for integration into "semantic mashups" in the browser. It is straightforward to reformat sets of XML metadata, including ISO, into simple RDF and then later refine and improve the RDF predicates by reusing known namespaces such as Dublin core, georss, etc. All scientific metadata should be part of the LOD world. In addition, we developed an "instant" drill-down and browse interface that provides faceted navigation so that the user can discover and explore the 25,000 datasets and 3000 services. The available facets and the free-text search box appear in the left panel, and the instantly updated results for the dataset search appear in the right panel. The user can constrain the value of a metadata facet simply by clicking on a word (or phrase) in the "word cloud" of values for each facet. The display section for each dataset includes the important metadata fields, a full description of the dataset, potentially some related URL's, and a "search" button that points to an OpenSearch GUI that is pre-configured to search for granules within the dataset. We will present our experiences with converting NASA metadata into LOD, discuss the challenges, illustrate some of the enabled mashups, and demonstrate the latest version of the "instant browse" interface for navigating multiple metadata collections.
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs and Increasing Value

NASA Astrophysics Data System (ADS)

Myers, J.; Hedstrom, M.; Plale, B. A.; Kumar, P.; McDonald, R.; Kooper, R.; Marini, L.; Kouper, I.; Chandrasekar, K.

2013-12-01

What if everything that researchers know about their data, and everything their applications know, were directly available to curators? What if all the information that data consumers discover and infer about data were also available? What if curation and preservation activities occurred incrementally, during research projects instead of after they end, and could be leveraged to make it easier to manage research data from the moment of its creation? These are questions that the Sustainable Environments - Actionable Data (SEAD) project, funded as part of the National Science Foundation's DataNet partnership, was designed to answer. Data curation is challenging, but it is made more difficult by the historical separation of data production, data use, and formal curation activities across organizations, locations, and applications, and across time. Modern computing and networking technologies allow a much different approach in which data and metadata can easily flow between these activities throughout the data lifecycle, and in which heterogeneous and evolving data and metadata can be managed. Sustainability research, SEAD's initial focus area, is a clear example of an area where the nature of the research (cross-disciplinary, integrating heterogeneous data from independent sources, small teams, rapid evolution of sensing and analysis techniques) and the barriers and costs inherent in traditional methods have limited adoption of existing curation tools and techniques, to the detriment of overall scientific progress. To explore these ideas and create a sustainable curation capability for communities such as sustainability research, the SEAD team has developed and is now deploying an interacting set of open source data services that demonstrate this approach. These services provide end-to-end support for management of data during research projects; publication of that data into long-term archives; and integration of it into community networks of publications, research center activities, and synthesis efforts. They build on a flexible ';semantic content management' architecture and incorporate notions of ';active' and ';social' curation - continuous, incremental curation activities performed by the data producers (active) and the community (social) that are motivated by a range of direct benefits. Examples include the use of metadata (tags) to allow generation of custom geospatial maps, automated metadata extraction to generate rich data pages for known formats, and the use of information about data authorship to allow automatic updates of personal and project research profiles when data is published. In this presentation, we describe the core capabilities of SEAD's services and their application in sustainability research. We also outline the key features of the SEAD architecture - the use of global semantic identifiers, extensible data and metadata models, web services to manage context shifts, scalable cloud storage - and highlight how this approach is particularly well suited to extension by independent third parties. We conclude with thoughts on how this approach can be applied to challenging issues such as exposing ';dark' data and reducing duplicate creation of derived data products, and can provide a new level of analytics for community analysis and coordination.
The PDS4 Metadata Management System

NASA Astrophysics Data System (ADS)

Raugh, A. C.; Hughes, J. S.

2018-04-01

We present the key features of the Planetary Data System (PDS) PDS4 Information Model as an extendable metadata management system for planetary metadata related to data structure, analysis/interpretation, and provenance.
In-hardware demonstration of model-independent adaptive tuning of noisy systems with arbitrary phase drift

DOE PAGES

Scheinker, Alexander; Baily, Scott; Young, Daniel; ...

2014-08-01

In this work, an implementation of a recently developed model-independent adaptive control scheme, for tuning uncertain and time varying systems, is demonstrated on the Los Alamos linear particle accelerator. The main benefits of the algorithm are its simplicity, ability to handle an arbitrary number of components without increased complexity, and the approach is extremely robust to measurement noise, a property which is both analytically proven and demonstrated in the experiments performed. We report on the application of this algorithm for simultaneous tuning of two buncher radio frequency (RF) cavities, in order to maximize beam acceptance into the accelerating electromagnetic fieldmore » cavities of the machine, with the tuning based only on a noisy measurement of the surviving beam current downstream from the two bunching cavities. The algorithm automatically responds to arbitrary phase shift of the cavity phases, automatically re-tuning the cavity settings and maximizing beam acceptance. Because it is model independent it can be utilized for continuous adaptation to time-variation of a large system, such as due to thermal drift, or damage to components, in which the remaining, functional components would be automatically re-tuned to compensate for the failing ones. We start by discussing the general model-independent adaptive scheme and how it may be digitally applied to a large class of multi-parameter uncertain systems, and then present our experimental results.« less
Adaptive sleep-wake discrimination for wearable devices.

PubMed

Karlen, Walter; Floreano, Dario

2011-04-01

Sleep/wake classification systems that rely on physiological signals suffer from intersubject differences that make accurate classification with a single, subject-independent model difficult. To overcome the limitations of intersubject variability, we suggest a novel online adaptation technique that updates the sleep/wake classifier in real time. The objective of the present study was to evaluate the performance of a newly developed adaptive classification algorithm that was embedded on a wearable sleep/wake classification system called SleePic. The algorithm processed ECG and respiratory effort signals for the classification task and applied behavioral measurements (obtained from accelerometer and press-button data) for the automatic adaptation task. When trained as a subject-independent classifier algorithm, the SleePic device was only able to correctly classify 74.94 ± 6.76% of the human-rated sleep/wake data. By using the suggested automatic adaptation method, the mean classification accuracy could be significantly improved to 92.98 ± 3.19%. A subject-independent classifier based on activity data only showed a comparable accuracy of 90.44 ± 3.57%. We demonstrated that subject-independent models used for online sleep-wake classification can successfully be adapted to previously unseen subjects without the intervention of human experts or off-line calibration.
Physiological Self-Regulation and Adaptive Automation

NASA Technical Reports Server (NTRS)

Prinzell, Lawrence J.; Pope, Alan T.; Freeman, Frederick G.

2007-01-01

Adaptive automation has been proposed as a solution to current problems of human-automation interaction. Past research has shown the potential of this advanced form of automation to enhance pilot engagement and lower cognitive workload. However, there have been concerns voiced regarding issues, such as automation surprises, associated with the use of adaptive automation. This study examined the use of psychophysiological self-regulation training with adaptive automation that may help pilots deal with these problems through the enhancement of cognitive resource management skills. Eighteen participants were assigned to 3 groups (self-regulation training, false feedback, and control) and performed resource management, monitoring, and tracking tasks from the Multiple Attribute Task Battery. The tracking task was cycled between 3 levels of task difficulty (automatic, adaptive aiding, manual) on the basis of the electroencephalogram-derived engagement index. The other two tasks remained in automatic mode that had a single automation failure. Those participants who had received self-regulation training performed significantly better and reported lower National Aeronautics and Space Administration Task Load Index scores than participants in the false feedback and control groups. The theoretical and practical implications of these results for adaptive automation are discussed.
Adaptive Personalized Training Games for Individual and Collaborative Rehabilitation of People with Multiple Sclerosis

PubMed Central

2014-01-01

Any rehabilitation involves people who are unique individuals with their own characteristics and rehabilitation needs, including patients suffering from Multiple Sclerosis (MS). The prominent variation of MS symptoms and the disease severity elevate a need to accommodate the patient diversity and support adaptive personalized training to meet every patient's rehabilitation needs. In this paper, we focus on integrating adaptivity and personalization in rehabilitation training for MS patients. We introduced the automatic adjustment of difficulty levels as an adaptation that can be provided in individual and collaborative rehabilitation training exercises for MS patients. Two user studies have been carried out with nine MS patients to investigate the outcome of this adaptation. The findings showed that adaptive personalized training trajectories have been successfully provided to MS patients according to their individual training progress, which was appreciated by the patients and the therapist. They considered the automatic adjustment of difficulty levels to provide more variety in the training and to minimize the therapists involvement in setting up the training. With regard to social interaction in the collaborative training exercise, we have observed some social behaviors between the patients and their training partner which indicated the development of social interaction during the training. PMID:24982862
Towards autonomous neuroprosthetic control using Hebbian reinforcement learning.

PubMed

Mahmoudi, Babak; Pohlmeyer, Eric A; Prins, Noeline W; Geng, Shijia; Sanchez, Justin C

2013-12-01

Our goal was to design an adaptive neuroprosthetic controller that could learn the mapping from neural states to prosthetic actions and automatically adjust adaptation using only a binary evaluative feedback as a measure of desirability/undesirability of performance. Hebbian reinforcement learning (HRL) in a connectionist network was used for the design of the adaptive controller. The method combines the efficiency of supervised learning with the generality of reinforcement learning. The convergence properties of this approach were studied using both closed-loop control simulations and open-loop simulations that used primate neural data from robot-assisted reaching tasks. The HRL controller was able to perform classification and regression tasks using its episodic and sequential learning modes, respectively. In our experiments, the HRL controller quickly achieved convergence to an effective control policy, followed by robust performance. The controller also automatically stopped adapting the parameters after converging to a satisfactory control policy. Additionally, when the input neural vector was reorganized, the controller resumed adaptation to maintain performance. By estimating an evaluative feedback directly from the user, the HRL control algorithm may provide an efficient method for autonomous adaptation of neuroprosthetic systems. This method may enable the user to teach the controller the desired behavior using only a simple feedback signal.
Morphological self-organizing feature map neural network with applications to automatic target recognition

NASA Astrophysics Data System (ADS)

Zhang, Shijun; Jing, Zhongliang; Li, Jianxun

2005-01-01

The rotation invariant feature of the target is obtained using the multi-direction feature extraction property of the steerable filter. Combining the morphological operation top-hat transform with the self-organizing feature map neural network, the adaptive topological region is selected. Using the erosion operation, the topological region shrinkage is achieved. The steerable filter based morphological self-organizing feature map neural network is applied to automatic target recognition of binary standard patterns and real-world infrared sequence images. Compared with Hamming network and morphological shared-weight networks respectively, the higher recognition correct rate, robust adaptability, quick training, and better generalization of the proposed method are achieved.
Development of a prototype automatic controller for liquid cooling garment inlet temperature

NASA Technical Reports Server (NTRS)

Weaver, C. S.; Webbon, B. W.; Montgomery, L. D.

1982-01-01

The development of a computer control of a liquid cooled garment (LCG) inlet temperature is descirbed. An adaptive model of the LCG is used to predict the heat-removal rates for various inlet temperatures. An experimental system that contains a microcomputer was constructed. The LCG inlet and outlet temperatures and the heat exchanger outlet temperature form the inputs to the computer. The adaptive model prediction method of control is successful during tests where the inlet temperature is automatically chosen by the computer. It is concluded that the program can be implemented in a microprocessor of a size that is practical for a life support back-pack.
Control Automation in Undersea Search and Manipulation

NASA Technical Reports Server (NTRS)

Weltman, Gershon; Freedy, Amos

1974-01-01

Automatic decision making and control mechanisms of the type termed "adaptive" or "intelligent" offer unique advantages for exploration and manipulation of the undersea environment, particularly at great depths. Because they are able to carry out human-like functions autonomously, such mechanisms can aid and extend the capabilities of the human operator. This paper reviews past and present work in the areas of adaptive control and robotics with the purpose of establishing logical guidelines for the application of automatic techniques underwater. Experimental research data are used to illustrate the importance of information feedback, personnel training, and methods of control allocation in the interaction between operator and intelligent machine.
A New Feedback-Based Method for Parameter Adaptation in Image Processing Routines.

PubMed

Khan, Arif Ul Maula; Mikut, Ralf; Reischl, Markus

2016-01-01

The parametrization of automatic image processing routines is time-consuming if a lot of image processing parameters are involved. An expert can tune parameters sequentially to get desired results. This may not be productive for applications with difficult image analysis tasks, e.g. when high noise and shading levels in an image are present or images vary in their characteristics due to different acquisition conditions. Parameters are required to be tuned simultaneously. We propose a framework to improve standard image segmentation methods by using feedback-based automatic parameter adaptation. Moreover, we compare algorithms by implementing them in a feedforward fashion and then adapting their parameters. This comparison is proposed to be evaluated by a benchmark data set that contains challenging image distortions in an increasing fashion. This promptly enables us to compare different standard image segmentation algorithms in a feedback vs. feedforward implementation by evaluating their segmentation quality and robustness. We also propose an efficient way of performing automatic image analysis when only abstract ground truth is present. Such a framework evaluates robustness of different image processing pipelines using a graded data set. This is useful for both end-users and experts.
Automatic vasculature identification in coronary angiograms by adaptive geometrical tracking.

PubMed

Xiao, Ruoxiu; Yang, Jian; Goyal, Mahima; Liu, Yue; Wang, Yongtian

2013-01-01

As the uneven distribution of contrast agents and the perspective projection principle of X-ray, the vasculatures in angiographic image are with low contrast and are generally superposed with other organic tissues; therefore, it is very difficult to identify the vasculature and quantitatively estimate the blood flow directly from angiographic images. In this paper, we propose a fully automatic algorithm named adaptive geometrical vessel tracking (AGVT) for coronary artery identification in X-ray angiograms. Initially, the ridge enhancement (RE) image is obtained utilizing multiscale Hessian information. Then, automatic initialization procedures including seed points detection, and initial directions determination are performed on the RE image. The extracted ridge points can be adjusted to the geometrical centerline points adaptively through diameter estimation. Bifurcations are identified by discriminating connecting relationship of the tracked ridge points. Finally, all the tracked centerlines are merged and smoothed by classifying the connecting components on the vascular structures. Synthetic angiographic images and clinical angiograms are used to evaluate the performance of the proposed algorithm. The proposed algorithm is compared with other two vascular tracking techniques in terms of the efficiency and accuracy, which demonstrate successful applications of the proposed segmentation and extraction scheme in vasculature identification.
A New Feedback-Based Method for Parameter Adaptation in Image Processing Routines

PubMed Central

Mikut, Ralf; Reischl, Markus

2016-01-01

The parametrization of automatic image processing routines is time-consuming if a lot of image processing parameters are involved. An expert can tune parameters sequentially to get desired results. This may not be productive for applications with difficult image analysis tasks, e.g. when high noise and shading levels in an image are present or images vary in their characteristics due to different acquisition conditions. Parameters are required to be tuned simultaneously. We propose a framework to improve standard image segmentation methods by using feedback-based automatic parameter adaptation. Moreover, we compare algorithms by implementing them in a feedforward fashion and then adapting their parameters. This comparison is proposed to be evaluated by a benchmark data set that contains challenging image distortions in an increasing fashion. This promptly enables us to compare different standard image segmentation algorithms in a feedback vs. feedforward implementation by evaluating their segmentation quality and robustness. We also propose an efficient way of performing automatic image analysis when only abstract ground truth is present. Such a framework evaluates robustness of different image processing pipelines using a graded data set. This is useful for both end-users and experts. PMID:27764213

Metadata improvements driving new tools and services at a NASA data center

NASA Astrophysics Data System (ADS)

Moroni, D. F.; Hausman, J.; Foti, G.; Armstrong, E. M.

2011-12-01

The NASA Physical Oceanography DAAC (PO.DAAC) is responsible for distributing and maintaining satellite derived oceanographic data from a number of NASA and non-NASA missions for the physical disciplines of ocean winds, sea surface temperature, ocean topography and gravity. Currently its holdings consist of over 600 datasets with a data archive in excess of 200 Terrabytes. The PO.DAAC has recently embarked on a metadata quality and completeness project to migrate, update and improve metadata records for over 300 public datasets. An interactive database management tool has been developed to allow data scientists to enter, update and maintain metadata records. This tool communicates directly with PO.DAAC's Data Management and Archiving System (DMAS), which serves as the new archival and distribution backbone as well as a permanent repository of dataset and granule-level metadata. Although we will briefly discuss the tool, more important ramifications are the ability to now expose, propagate and leverage the metadata in a number of ways. First, the metadata are exposed directly through a faceted and free text search interface directly from drupal-based PO.DAAC web pages allowing for quick browsing and data discovery especially by "drilling" through the various facet levels that organize datasets by time/space resolution, processing level, sensor, measurement type etc. Furthermore, the metadata can now be exposed through web services to produce metadata records in a number of different formats such as FGDC and ISO 19115, or potentially propagated to visualization and subsetting tools, and other discovery interfaces. The fundamental concept is that the metadata forms the essential bridge between the user, and the tool or discovery mechanism for a broad range of ocean earth science data records.
Study of Adaptive Mathematical Models for Deriving Automated Pilot Performance Measurement Techniques. Volume I. Model Development.

ERIC Educational Resources Information Center

Connelly, Edward A.; And Others

A new approach to deriving human performance measures and criteria for use in automatically evaluating trainee performance is documented in this report. The ultimate application of the research is to provide methods for automatically measuring pilot performance in a flight simulator or from recorded in-flight data. An efficient method of…
Study of Adaptive Mathematical Models for Deriving Automated Pilot Performance Measurement Techniques. Volume II. Appendices. Final Report.

ERIC Educational Resources Information Center

Connelly, E. M.; And Others

A new approach to deriving human performance measures and criteria for use in automatically evaluating trainee performance is described. Ultimately, this approach will allow automatic measurement of pilot performance in a flight simulator or from recorded in-flight data. An efficient method of representing performance data within a computer is…
Using Automatic Item Generation to Meet the Increasing Item Demands of High-Stakes Educational and Occupational Assessment

ERIC Educational Resources Information Center

Arendasy, Martin E.; Sommer, Markus

2012-01-01

The use of new test administration technologies such as computerized adaptive testing in high-stakes educational and occupational assessments demands large item pools. Classic item construction processes and previous approaches to automatic item generation faced the problems of a considerable loss of items after the item calibration phase. In this…
38 CFR 17.157 - Definition-adaptive equipment.

Code of Federal Regulations, 2010 CFR

2010-07-01

... includes, but is not limited to, a basic automatic transmission, power steering, power brakes, power window lifts, power seats, air-conditioning equipment when necessary for the health and safety of the veteran... MEDICAL Automotive Equipment and Driver Training § 17.157 Definition-adaptive equipment. The term...
EPA Metadata Style Guide Keywords and EPA Organization Names

EPA Pesticide Factsheets

The following keywords and EPA organization names listed below, along with EPA’s Metadata Style Guide, are intended to provide suggestions and guidance to assist with the standardization of metadata records.
Interpreting the ASTM 'content standard for digital geospatial metadata'

USGS Publications Warehouse

Nebert, Douglas D.

1996-01-01

ASTM and the Federal Geographic Data Committee have developed a content standard for spatial metadata to facilitate documentation, discovery, and retrieval of digital spatial data using vendor-independent terminology. Spatial metadata elements are identifiable quality and content characteristics of a data set that can be tied to a geographic location or area. Several Office of Management and Budget Circulars and initiatives have been issued that specify improved cataloguing of and accessibility to federal data holdings. An Executive Order further requires the use of the metadata content standard to document digital spatial data sets. Collection and reporting of spatial metadata for field investigations performed for the federal government is an anticipated requirement. This paper provides an overview of the draft spatial metadata content standard and a description of how the standard could be applied to investigations collecting spatially-referenced field data.
Workflows for ingest of research data into digital archives - tests with Archivematica

NASA Astrophysics Data System (ADS)

Kirchner, I.; Bertelmann, R.; Gebauer, P.; Hasler, T.; Hirt, M.; Klump, J. F.; Peters-Kotting, W.; Rusch, B.; Ulbricht, D.

2013-12-01

Publication of research data and future re-use of measured data require the long-term preservation of digital objects. The ISO OAIS reference model defines responsibilities for long-term preservation of digital objects and although there is software available to support preservation of digital data, there are still problems remaining to be solved. A key task in preservation is to make the datasets ready for ingest into the archive, which is called the creation of Submission Information Packages (SIPs) in the OAIS model. This includes the creation of appropriate preservation metadata. Scientists need to be trained to deal with different types of data and to heighten their awareness for quality metadata. Other problems arise during the assembly of SIPs and during ingest into the archive because file format validators may produce conflicting output for identical data files and these conflicts are difficult to resolve automatically. Also, validation and identification tools are notorious for their poor performance. In the project EWIG Zuse-Institute Berlin acts as an infrastructure facility, while the Institute for Meteorology at FU Berlin and the German research Centre for Geosciences GFZ act as two different data producers. The aim of the project is to develop workflows for the transfer of research data into digital archives and the future re-use of data from long-term archives with emphasis on data from the geosciences. The technical work is supplemented by interviews with data practitioners at several institutions to identify problems in digital preservation workflows and by the development of university teaching materials to train students in the curation of research data and metadata. The free and open-source software Archivematica [1] is used as digital preservation system. The creation and ingest of SIPs has to meet several archival standards and be compatible to the Metadata Encoding and Transmission Standard (METS). The two data producers use different software in their workflows to test the assembly of SIPs and ingest of SIPs into the archive. GFZ Potsdam uses a combination of eSciDoc [2], panMetaDocs [3], and bagit [4] to collect research data and assemble SIPs for ingest into Archivematica, while the Institute for Meteorology at FU Berlin evaluates a variety of software solutions to describe data and publications and to generate SIPs. [1] http://www.archivematica.org [2] http://www.escidoc.org [3] http://panmetadocs.sf.net [4] http://sourceforge.net/projects/loc-xferutils/
Classification of Twitter Users Who Tweet About E-Cigarettes

PubMed Central

Miano, Thomas; Chew, Robert; Eggers, Matthew; Nonnemaker, James

2017-01-01

Background Despite concerns about their health risks, e‑cigarettes have gained popularity in recent years. Concurrent with the recent increase in e‑cigarette use, social media sites such as Twitter have become a common platform for sharing information about e-cigarettes and to promote marketing of e‑cigarettes. Monitoring the trends in e‑cigarette–related social media activity requires timely assessment of the content of posts and the types of users generating the content. However, little is known about the diversity of the types of users responsible for generating e‑cigarette–related content on Twitter. Objective The aim of this study was to demonstrate a novel methodology for automatically classifying Twitter users who tweet about e‑cigarette–related topics into distinct categories. Methods We collected approximately 11.5 million e‑cigarette–related tweets posted between November 2014 and October 2016 and obtained a random sample of Twitter users who tweeted about e‑cigarettes. Trained human coders examined the handles’ profiles and manually categorized each as one of the following user types: individual (n=2168), vaper enthusiast (n=334), informed agency (n=622), marketer (n=752), and spammer (n=1021). Next, the Twitter metadata as well as a sample of tweets for each labeled user were gathered, and features that reflect users’ metadata and tweeting behavior were analyzed. Finally, multiple machine learning algorithms were tested to identify a model with the best performance in classifying user types. Results Using a classification model that included metadata and features associated with tweeting behavior, we were able to predict with relatively high accuracy five different types of Twitter users that tweet about e‑cigarettes (average F1 score=83.3%). Accuracy varied by user type, with F1 scores of individuals, informed agencies, marketers, spammers, and vaper enthusiasts being 91.1%, 84.4%, 81.2%, 79.5%, and 47.1%, respectively. Vaper enthusiasts were the most challenging user type to predict accurately and were commonly misclassified as marketers. The inclusion of additional tweet-derived features that capture tweeting behavior was found to significantly improve the model performance—an overall F1 score gain of 10.6%—beyond metadata features alone. Conclusions This study provides a method for classifying five different types of users who tweet about e‑cigarettes. Our model achieved high levels of classification performance for most groups, and examining the tweeting behavior was critical in improving the model performance. Results can help identify groups engaged in conversations about e‑cigarettes online to help inform public health surveillance, education, and regulatory efforts. PMID:28951381
The Importance of Metadata in System Development and IKM

DTIC Science & Technology

2003-02-01

Defence R& D Canada The Importance of Metadata in System Development and IKM Anthony W. Isenor Technical Memorandum DRDC Atlantic TM 2003-011...Metadata in System Development and IKM Anthony W. Isenor Defence R& D Canada – Atlantic Technical Memorandum DRDC Atlantic TM 2003-011 February... it is important for searches and providing relevant information to the client. A comparison of metadata standards was conducted with emphasis on
The Global Streamflow Indices and Metadata Archive (GSIM) - Part 1: The production of a daily streamflow archive and metadata

NASA Astrophysics Data System (ADS)

Do, Hong Xuan; Gudmundsson, Lukas; Leonard, Michael; Westra, Seth

2018-04-01

This is the first part of a two-paper series presenting the Global Streamflow Indices and Metadata archive (GSIM), a worldwide collection of metadata and indices derived from more than 35 000 daily streamflow time series. This paper focuses on the compilation of the daily streamflow time series based on 12 free-to-access streamflow databases (seven national databases and five international collections). It also describes the development of three metadata products (freely available at https://doi.pangaea.de/10.1594/PANGAEA.887477): (1) a GSIM catalogue collating basic metadata associated with each time series, (2) catchment boundaries for the contributing area of each gauge, and (3) catchment metadata extracted from 12 gridded global data products representing essential properties such as land cover type, soil type, and climate and topographic characteristics. The quality of the delineated catchment boundary is also made available and should be consulted in GSIM application. The second paper in the series then explores production and analysis of streamflow indices. Having collated an unprecedented number of stations and associated metadata, GSIM can be used to advance large-scale hydrological research and improve understanding of the global water cycle.
Design and Implementation of a Metadata-rich File System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ames, S; Gokhale, M B; Maltzahn, C

2010-01-19

Despite continual improvements in the performance and reliability of large scale file systems, the management of user-defined file system metadata has changed little in the past decade. The mismatch between the size and complexity of large scale data stores and their ability to organize and query their metadata has led to a de facto standard in which raw data is stored in traditional file systems, while related, application-specific metadata is stored in relational databases. This separation of data and semantic metadata requires considerable effort to maintain consistency and can result in complex, slow, and inflexible system operation. To address thesemore » problems, we have developed the Quasar File System (QFS), a metadata-rich file system in which files, user-defined attributes, and file relationships are all first class objects. In contrast to hierarchical file systems and relational databases, QFS defines a graph data model composed of files and their relationships. QFS incorporates Quasar, an XPATH-extended query language for searching the file system. Results from our QFS prototype show the effectiveness of this approach. Compared to the de facto standard, the QFS prototype shows superior ingest performance and comparable query performance on user metadata-intensive operations and superior performance on normal file metadata operations.« less
Achieving Sub-Second Search in the CMR

NASA Astrophysics Data System (ADS)

Gilman, J.; Baynes, K.; Pilone, D.; Mitchell, A. E.; Murphy, K. J.

2014-12-01

The Common Metadata Repository (CMR) is the next generation Earth Science Metadata catalog for NASA's Earth Observing data. It joins together the holdings from the EOS Clearing House (ECHO) and the Global Change Master Directory (GCMD), creating a unified, authoritative source for EOSDIS metadata. The CMR allows ingest in many different formats while providing consistent search behavior and retrieval in any supported format. Performance is a critical component of the CMR, ensuring improved data discovery and client interactivity. The CMR delivers sub-second search performance for any of the common query conditions (including spatial) across hundreds of millions of metadata granules. It also allows the addition of new metadata concepts such as visualizations, parameter metadata, and documentation. The CMR's goals presented many challenges. This talk will describe the CMR architecture, design, and innovations that were made to achieve its goals. This includes: * Architectural features like immutability and backpressure. * Data management techniques such as caching and parallel loading that give big performance gains. * Open Source and COTS tools like Elasticsearch search engine. * Adoption of Clojure, a functional programming language for the Java Virtual Machine. * Development of a custom spatial search plugin for Elasticsearch and why it was necessary. * Introduction of a unified model for metadata that maps every supported metadata format to a consistent domain model.
Syntactic and Semantic Validation without a Metadata Management System

NASA Technical Reports Server (NTRS)

Pollack, Janine; Gokey, Christopher D.; Kendig, David; Olsen, Lola; Wharton, Stephen W. (Technical Monitor)

2001-01-01

The ability to maintain quality information is essential to securing the confidence in any system for which the information serves as a data source. NASA's Global Change Master Directory (GCMD), an online Earth science data locator, holds over 9000 data set descriptions and is in a constant state of flux as metadata are created and updated on a daily basis. In such a system, the importance of maintaining the consistency and integrity of these-metadata is crucial. The GCMD has developed a metadata management system utilizing XML, controlled vocabulary, and Java technologies to ensure the metadata not only adhere to valid syntax, but also exhibit proper semantics.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Weers, Jon; Anderson, Arlene

All data submitted to the U.S. Department of Energy's Geothermal Data Repository (GDR) is eventually made public. The metadata for these data submissions is searchable in multiple data catalogs, including the GDR catalog and the data catalog on OpenEI.org. Because it is a node on the National Geothermal Data System (NGDS), all data on the GDR are also discoverable through both the regular Identifier (DOI), and as a byproduct of this assignment, these submissions are automatically registered in the Office of Science and Technical Information (OSTI) DataCite catalog. From there, these data are federated to additional sites both domestic andmore » international, including Science.gov and WorldWideScience.org. This paper will explore in detail the wide reach of data submitted to the GDR from and how this exposure can dramatically increase the utility of submitted data.« less
Interoperability format translation and transformation between IFC architectural design file and simulation file formats

DOEpatents

Chao, Tian-Jy; Kim, Younghun

2015-02-03

Automatically translating a building architecture file format (Industry Foundation Class) to a simulation file, in one aspect, may extract data and metadata used by a target simulation tool from a building architecture file. Interoperability data objects may be created and the extracted data is stored in the interoperability data objects. A model translation procedure may be prepared to identify a mapping from a Model View Definition to a translation and transformation function. The extracted data may be transformed using the data stored in the interoperability data objects, an input Model View Definition template, and the translation and transformation function to convert the extracted data to correct geometric values needed for a target simulation file format used by the target simulation tool. The simulation file in the target simulation file format may be generated.
Evaluating non-relational storage technology for HEP metadata and meta-data catalog

NASA Astrophysics Data System (ADS)

Grigorieva, M. A.; Golosova, M. V.; Gubin, M. Y.; Klimentov, A. A.; Osipova, V. V.; Ryabinkin, E. A.

2016-10-01

Large-scale scientific experiments produce vast volumes of data. These data are stored, processed and analyzed in a distributed computing environment. The life cycle of experiment is managed by specialized software like Distributed Data Management and Workload Management Systems. In order to be interpreted and mined, experimental data must be accompanied by auxiliary metadata, which are recorded at each data processing step. Metadata describes scientific data and represent scientific objects or results of scientific experiments, allowing them to be shared by various applications, to be recorded in databases or published via Web. Processing and analysis of constantly growing volume of auxiliary metadata is a challenging task, not simpler than the management and processing of experimental data itself. Furthermore, metadata sources are often loosely coupled and potentially may lead to an end-user inconsistency in combined information queries. To aggregate and synthesize a range of primary metadata sources, and enhance them with flexible schema-less addition of aggregated data, we are developing the Data Knowledge Base architecture serving as the intelligence behind GUIs and APIs.
Survey data and metadata modelling using document-oriented NoSQL

NASA Astrophysics Data System (ADS)

Rahmatuti Maghfiroh, Lutfi; Gusti Bagus Baskara Nugraha, I.

2018-03-01

Survey data that are collected from year to year have metadata change. However it need to be stored integratedly to get statistical data faster and easier. Data warehouse (DW) can be used to solve this limitation. However there is a change of variables in every period that can not be accommodated by DW. Traditional DW can not handle variable change via Slowly Changing Dimension (SCD). Previous research handle the change of variables in DW to manage metadata by using multiversion DW (MVDW). MVDW is designed using relational model. Some researches also found that developing nonrelational model in NoSQL database has reading time faster than the relational model. Therefore, we propose changes to metadata management by using NoSQL. This study proposes a model DW to manage change and algorithms to retrieve data with metadata changes. Evaluation of the proposed models and algorithms result in that database with the proposed design can retrieve data with metadata changes properly. This paper has contribution in comprehensive data analysis with metadata changes (especially data survey) in integrated storage.
Comparison of a brain-based adaptive system and a manual adaptable system for invoking automation.

PubMed

Bailey, Nathan R; Scerbo, Mark W; Freeman, Frederick G; Mikulka, Peter J; Scott, Lorissa A

2006-01-01

Two experiments are presented examining adaptive and adaptable methods for invoking automation. Empirical investigations of adaptive automation have focused on methods used to invoke automation or on automation-related performance implications. However, no research has addressed whether performance benefits associated with brain-based systems exceed those in which users have control over task allocations. Participants performed monitoring and resource management tasks as well as a tracking task that shifted between automatic and manual modes. In the first experiment, participants worked with an adaptive system that used their electroencephalographic signals to switch the tracking task between automatic and manual modes. Participants were also divided between high- and low-reliability conditions for the system-monitoring task as well as high- and low-complacency potential. For the second experiment, participants operated an adaptable system that gave them manual control over task allocations. Results indicated increased situation awareness (SA) of gauge instrument settings for individuals high in complacency potential using the adaptive system. In addition, participants who had control over automation performed more poorly on the resource management task and reported higher levels of workload. A comparison between systems also revealed enhanced SA of gauge instrument settings and decreased workload in the adaptive condition. The present results suggest that brain-based adaptive automation systems may enhance perceptual level SA while reducing mental workload relative to systems requiring user-initiated control. Potential applications include automated systems for which operator monitoring performance and high-workload conditions are of concern.
Improving the accessibility and re-use of environmental models through provision of model metadata - a scoping study

NASA Astrophysics Data System (ADS)

Riddick, Andrew; Hughes, Andrew; Harpham, Quillon; Royse, Katherine; Singh, Anubha

2014-05-01

There has been an increasing interest both from academic and commercial organisations over recent years in developing hydrologic and other environmental models in response to some of the major challenges facing the environment, for example environmental change and its effects and ensuring water resource security. This has resulted in a significant investment in modelling by many organisations both in terms of financial resources and intellectual capital. To capitalise on the effort on producing models, then it is necessary for the models to be both discoverable and appropriately described. If this is not undertaken then the effort in producing the models will be wasted. However, whilst there are some recognised metadata standards relating to datasets these may not completely address the needs of modellers regarding input data for example. Also there appears to be a lack of metadata schemes configured to encourage the discovery and re-use of the models themselves. The lack of an established standard for model metadata is considered to be a factor inhibiting the more widespread use of environmental models particularly the use of linked model compositions which fuse together hydrologic models with models from other environmental disciplines. This poster presents the results of a Natural Environment Research Council (NERC) funded scoping study to understand the requirements of modellers and other end users for metadata about data and models. A user consultation exercise using an on-line questionnaire has been undertaken to capture the views of a wide spectrum of stakeholders on how they are currently managing metadata for modelling. This has provided a strong confirmation of our original supposition that there is a lack of systems and facilities to capture metadata about models. A number of specific gaps in current provision for data and model metadata were also identified, including a need for a standard means to record detailed information about the modelling environment and the model code used, to assist the selection of models for linked compositions. Existing best practice, including the use of current metadata standards (e.g. ISO 19110, ISO 19115 and ISO 19119) and the metadata components of WaterML were also evaluated. In addition to commonly used metadata attributes (e.g. spatial reference information) there was significant interest in recording a variety of additional metadata attributes. These included more detailed information about temporal data, and also providing estimates of data accuracy and uncertainty within metadata. This poster describes the key results of this study, including a number of gaps in the provision of metadata for modelling, and outlines how these might be addressed. Overall the scoping study has highlighted significant interest in addressing this issue within the environmental modelling community. There is therefore an impetus for on-going research, and we are seeking to take this forward through collaboration with other interested organisations. Progress towards an internationally recognised model metadata standard is suggested.

INITIAL APPL;ICATION OF THE ADAPTIVE GRID AIR POLLUTION MODEL

EPA Science Inventory

The paper discusses an adaptive-grid algorithm used in air pollution models. The algorithm reduces errors related to insufficient grid resolution by automatically refining the grid scales in regions of high interest. Meanwhile the grid scales are coarsened in other parts of the d...
Descriptive Metadata: Emerging Standards.

ERIC Educational Resources Information Center

Ahronheim, Judith R.

1998-01-01

Discusses metadata, digital resources, cross-disciplinary activity, and standards. Highlights include Standard Generalized Markup Language (SGML); Extensible Markup Language (XML); Dublin Core; Resource Description Framework (RDF); Text Encoding Initiative (TEI); Encoded Archival Description (EAD); art and cultural-heritage metadata initiatives;…
Automated Metadata Extraction

DTIC Science & Technology

2008-06-01

provides a means for file owners to add metadata which can then be used by iTunes for cataloging and searching [4]. Metadata can be stored in different...based and contain AAC data formats [3]. Specifically, Apple uses Protected AAC to encode copy-protected music titles purchased from the iTunes Music...Store [4]. The files purchased from the iTunes Music Store include the following metadata. • Name • Email address of purchaser • Year • Album
A Solution to Metadata: Using XML Transformations to Automate Metadata

DTIC Science & Technology

2010-06-01

developed their own metadata standards—Directory Interchange Format (DIF), Ecological Metadata Language ( EML ), and International Organization for...mented all their data using the EML standard. However, when later attempting to publish to a data clearinghouse— such as the Geospatial One-Stop (GOS...construct calls to its transform(s) method by providing the type of the incoming content (e.g., eml ), the type of the resulting content (e.g., fgdc) and
The Department of Defense Net-Centric Data Strategy: Implementation Requires a Joint Community of Interest (COI) Working Group and Joint COI Oversight Council

DTIC Science & Technology

2007-05-17

metadata formats, metadata repositories, enterprise portals and federated search engines that make data visible, available, and usable to users...and provides the metadata formats, metadata repositories, enterprise portals and federated search engines that make data visible, available, and...develop an enterprise- wide data sharing plan, establishment of mission area governance processes for CIOs, DISA development of federated search specifications
FRAMES Metadata Reporting Templates for Ecohydrological Observations, version 1.1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Christianson, Danielle; Varadharajan, Charuleka; Christoffersen, Brad

FRAMES is a a set of Excel metadata files and package-level descriptive metadata that are designed to facilitate and improve capture of desired metadata for ecohydrological observations. The metadata are bundled with data files into a data package and submitted to a data repository (e.g. the NGEE Tropics Data Repository) via a web form. FRAMES standardizes reporting of diverse ecohydrological and biogeochemical data for synthesis across a range of spatiotemporal scales and incorporates many best data science practices. This version of FRAMES supports observations for primarily automated measurements collected by permanently located sensors, including sap flow (tree water use), leafmore » surface temperature, soil water content, dendrometry (stem diameter growth increment), and solar radiation. Version 1.1 extend the controlled vocabulary and incorporates functionality to facilitate programmatic use of data and FRAMES metadata (R code available at NGEE Tropics Data Repository).« less
Collection Metadata Solutions for Digital Library Applications

NASA Technical Reports Server (NTRS)

Hill, Linda L.; Janee, Greg; Dolin, Ron; Frew, James; Larsgaard, Mary

1999-01-01

Within a digital library, collections may range from an ad hoc set of objects that serve a temporary purpose to established library collections intended to persist through time. The objects in these collections vary widely, from library and data center holdings to pointers to real-world objects, such as geographic places, and the various metadata schemas that describe them. The key to integrated use of such a variety of collections in a digital library is collection metadata that represents the inherent and contextual characteristics of a collection. The Alexandria Digital Library (ADL) Project has designed and implemented collection metadata for several purposes: in XML form, the collection metadata "registers" the collection with the user interface client; in HTML form, it is used for user documentation; eventually, it will be used to describe the collection to network search agents; and it is used for internal collection management, including mapping the object metadata attributes to the common search parameters of the system.
METADATA REGISTRY, ISO/IEC 11179

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pon, R K; Buttler, D J

2008-01-03

ISO/IEC-11179 is an international standard that documents the standardization and registration of metadata to make data understandable and shareable. This standardization and registration allows for easier locating, retrieving, and transmitting data from disparate databases. The standard defines the how metadata are conceptually modeled and how they are shared among parties, but does not define how data is physically represented as bits and bytes. The standard consists of six parts. Part 1 provides a high-level overview of the standard and defines the basic element of a metadata registry - a data element. Part 2 defines the procedures for registering classification schemesmore » and classifying administered items in a metadata registry (MDR). Part 3 specifies the structure of an MDR. Part 4 specifies requirements and recommendations for constructing definitions for data and metadata. Part 5 defines how administered items are named and identified. Part 6 defines how administered items are registered and assigned an identifier.« less
The Energy Industry Profile of ISO/DIS 19115-1: Facilitating Discovery and Evaluation of, and Access to Distributed Information Resources

NASA Astrophysics Data System (ADS)

Hills, S. J.; Richard, S. M.; Doniger, A.; Danko, D. M.; Derenthal, L.; Energistics Metadata Work Group

2011-12-01

A diverse group of organizations representative of the international community involved in disciplines relevant to the upstream petroleum industry, - energy companies, - suppliers and publishers of information to the energy industry, - vendors of software applications used by the industry, - partner government and academic organizations, has engaged in the Energy Industry Metadata Standards Initiative. This Initiative envisions the use of standard metadata within the community to enable significant improvements in the efficiency with which users discover, evaluate, and access distributed information resources. The metadata standard needed to realize this vision is the initiative's primary deliverable. In addition to developing the metadata standard, the initiative is promoting its adoption to accelerate realization of the vision, and publishing metadata exemplars conformant with the standard. Implementation of the standard by community members, in the form of published metadata which document the information resources each organization manages, will allow use of tools requiring consistent metadata for efficient discovery and evaluation of, and access to, information resources. While metadata are expected to be widely accessible, access to associated information resources may be more constrained. The initiative is being conducting by Energistics' Metadata Work Group, in collaboration with the USGIN Project. Energistics is a global standards group in the oil and natural gas industry. The Work Group determined early in the initiative, based on input solicited from 40+ organizations and on an assessment of existing metadata standards, to develop the target metadata standard as a profile of a revised version of ISO 19115, formally the "Energy Industry Profile of ISO/DIS 19115-1 v1.0" (EIP). The Work Group is participating on the ISO/TC 211 project team responsible for the revision of ISO 19115, now ready for "Draft International Standard" (DIS) status. With ISO 19115 an established, capability-rich, open standard for geographic metadata, EIP v1 is expected to be widely acceptable within the community and readily sustainable over the long-term. The EIP design, also per community requirements, will enable discovery, evaluation, and access to types of information resources considered important to the community, including structured and unstructured digital resources, and physical assets such as hardcopy documents and material samples. This presentation will briefly review the development of this initiative as well as the current and planned Work Group activities. More time will be spent providing an overview of the EIP v1, including the requirements it prescribes, design efforts made to enable automated metadata capture and processing, and the structure and content of its documentation, which was written to minimize ambiguity and facilitate implementation. The Work Group considers EIP v1 a solid initial design for interoperable metadata, and first step toward the vision of the Initiative.
Usability and Interoperability Improvements for an EASE-Grid 2.0 Passive Microwave Data Product Using CF Conventions

NASA Astrophysics Data System (ADS)

Hardman, M.; Brodzik, M. J.; Long, D. G.

2017-12-01

Beginning in 1978, the satellite passive microwave data record has been a mainstay of remote sensing of the cryosphere, providing twice-daily, near-global spatial coverage for monitoring changes in hydrologic and cryospheric parameters that include precipitation, soil moisture, surface water, vegetation, snow water equivalent, sea ice concentration and sea ice motion. Historical versions of the gridded passive microwave data sets were produced as flat binary files described in human-readable documentation. This format is error-prone and makes it difficult to reliably include all processing and provenance. Funded by NASA MEaSUREs, we have completely reprocessed the gridded data record that includes SMMR, SSM/I-SSMIS and AMSR-E. The new Calibrated Enhanced-Resolution Brightness Temperature (CETB) Earth System Data Record (ESDR) files are self-describing. Our approach to the new data set was to create netCDF4 files that use standard metadata conventions and best practices to incorporate file-level, machine- and human-readable contents, geolocation, processing and provenance metadata. We followed the flexible and adaptable Climate and Forecast (CF-1.6) Conventions with respect to their coordinate conventions and map projection parameters. Additionally, we made use of Attribute Conventions for Dataset Discovery (ACDD-1.3) that provided file-level conventions with spatio-temporal bounds that enable indexing software to search for coverage. Our CETB files also include temporal coverage and spatial resolution in the file-level metadata for human-readability. We made use of the JPL CF/ACDD Compliance Checker to guide this work. We tested our file format with real software, for example, netCDF Command-line Operators (NCO) power tools for unlimited control on spatio-temporal subsetting and concatenation of files. The GDAL tools understand the CF metadata and produce fully-compliant geotiff files from our data. ArcMap can then reproject the geotiff files on-the-fly and work with other geolocated data such as coastlines, with no special work required. We expect this combination of standards and well-tested interoperability to significantly improve the usability of this important ESDR for the Earth Science community.
Auto-adaptive finite element meshes

NASA Technical Reports Server (NTRS)

Richter, Roland; Leyland, Penelope

1995-01-01

Accurate capturing of discontinuities within compressible flow computations is achieved by coupling a suitable solver with an automatic adaptive mesh algorithm for unstructured triangular meshes. The mesh adaptation procedures developed rely on non-hierarchical dynamical local refinement/derefinement techniques, which hence enable structural optimization as well as geometrical optimization. The methods described are applied for a number of the ICASE test cases are particularly interesting for unsteady flow simulations.
Continuously Adaptive vs. Discrete Changes of Task Difficulty in the Training of a Complex Perceptual-Motor Task.

ERIC Educational Resources Information Center

Wood, Milton E.

The purpose of the effort was to determine the benefits to be derived from the adaptive training technique of automatically adjusting task difficulty as a function of a student skill during early learning of a complex perceptual motor task. A digital computer provided the task dynamics, scoring, and adaptive control of a second-order, two-axis,…
Using a linked data approach to aid development of a metadata portal to support Marine Strategy Framework Directive (MSFD) implementation

NASA Astrophysics Data System (ADS)

Wood, Chris

2016-04-01

Under the Marine Strategy Framework Directive (MSFD), EU Member States are mandated to achieve or maintain 'Good Environmental Status' (GES) in their marine areas by 2020, through a series of Programme of Measures (PoMs). The Celtic Seas Partnership (CSP), an EU LIFE+ project, aims to support policy makers, special-interest groups, users of the marine environment, and other interested stakeholders on MSFD implementation in the Celtic Seas geographical area. As part of this support, a metadata portal has been built to provide a signposting service to datasets that are relevant to MSFD within the Celtic Seas. To ensure that the metadata has the widest possible reach, a linked data approach was employed to construct the database. Although the metadata are stored in a traditional RDBS, the metadata are exposed as linked data via the D2RQ platform, allowing virtual RDF graphs to be generated. SPARQL queries can be executed against the end-point allowing any user to manipulate the metadata. D2RQ's mapping language, based on turtle, was used to map a wide range of relevant ontologies to the metadata (e.g. The Provenance Ontology (prov-o), Ocean Data Ontology (odo), Dublin Core Elements and Terms (dc & dcterms), Friend of a Friend (foaf), and Geospatial ontologies (geo)) allowing users to browse the metadata, either via SPARQL queries or by using D2RQ's HTML interface. The metadata were further enhanced by mapping relevant parameters to the NERC Vocabulary Server, itself built on a SPARQL endpoint. Additionally, a custom web front-end was built to enable users to browse the metadata and express queries through an intuitive graphical user interface that requires no prior knowledge of SPARQL. As well as providing means to browse the data via MSFD-related parameters (Descriptor, Criteria, and Indicator), the metadata records include the dataset's country of origin, the list of organisations involved in the management of the data, and links to any relevant INSPIRE-compliant services relating to the dataset. The web front-end therefore enables users to effectively filter, sort, or search the metadata. As the MSFD timeline requires Member States to review their progress on achieving or maintaining GES every six years, the timely development of this metadata portal will not only aid interested stakeholders in understanding how member states are meeting their targets, but also shows how linked data can be used effectively to support policy makers and associated legislative bodies.
Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information.

PubMed

Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Yan, Bin; Li, Jianxin

2015-01-01

Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition.
Automatic Artifact Removal from Electroencephalogram Data Based on A Priori Artifact Information

PubMed Central

Zhang, Chi; Tong, Li; Zeng, Ying; Jiang, Jingfang; Bu, Haibing; Li, Jianxin

2015-01-01

Electroencephalogram (EEG) is susceptible to various nonneural physiological artifacts. Automatic artifact removal from EEG data remains a key challenge for extracting relevant information from brain activities. To adapt to variable subjects and EEG acquisition environments, this paper presents an automatic online artifact removal method based on a priori artifact information. The combination of discrete wavelet transform and independent component analysis (ICA), wavelet-ICA, was utilized to separate artifact components. The artifact components were then automatically identified using a priori artifact information, which was acquired in advance. Subsequently, signal reconstruction without artifact components was performed to obtain artifact-free signals. The results showed that, using this automatic online artifact removal method, there were statistical significant improvements of the classification accuracies in both two experiments, namely, motor imagery and emotion recognition. PMID:26380294
THE NEW ONLINE METADATA EDITOR FOR GENERATING STRUCTURED METADATA

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Shrestha, Biva; Palanisamy, Giri

Nobody is better suited to describe data than the scientist who created it. This description about a data is called Metadata. In general terms, Metadata represents the who, what, when, where, why and how of the dataset [1]. eXtensible Markup Language (XML) is the preferred output format for metadata, as it makes it portable and, more importantly, suitable for system discoverability. The newly developed ORNL Metadata Editor (OME) is a Web-based tool that allows users to create and maintain XML files containing key information, or metadata, about the research. Metadata include information about the specific projects, parameters, time periods, andmore » locations associated with the data. Such information helps put the research findings in context. In addition, the metadata produced using OME will allow other researchers to find these data via Metadata clearinghouses like Mercury [2][4]. OME is part of ORNL s Mercury software fleet [2][3]. It was jointly developed to support projects funded by the United States Geological Survey (USGS), U.S. Department of Energy (DOE), National Aeronautics and Space Administration (NASA) and National Oceanic and Atmospheric Administration (NOAA). OME s architecture provides a customizable interface to support project-specific requirements. Using this new architecture, the ORNL team developed OME instances for USGS s Core Science Analytics, Synthesis, and Libraries (CSAS&L), DOE s Next Generation Ecosystem Experiments (NGEE) and Atmospheric Radiation Measurement (ARM) Program, and the international Surface Ocean Carbon Dioxide ATlas (SOCAT). Researchers simply use the ORNL Metadata Editor to enter relevant metadata into a Web-based form. From the information on the form, the Metadata Editor can create an XML file on the server that the editor is installed or to the user s personal computer. Researchers can also use the ORNL Metadata Editor to modify existing XML metadata files. As an example, an NGEE Arctic scientist use OME to register their datasets to the NGEE data archive and allows the NGEE archive to publish these datasets via a data search portal (http://ngee.ornl.gov/data). These highly descriptive metadata created using OME allows the Archive to enable advanced data search options using keyword, geo-spatial, temporal and ontology filters. Similarly, ARM OME allows scientists or principal investigators (PIs) to submit their data products to the ARM data archive. How would OME help Big Data Centers like the Oak Ridge National Laboratory Distributed Active Archive Center (ORNL DAAC)? The ORNL DAAC is one of NASA s Earth Observing System Data and Information System (EOSDIS) data centers managed by the Earth Science Data and Information System (ESDIS) Project. The ORNL DAAC archives data produced by NASA's Terrestrial Ecology Program. The DAAC provides data and information relevant to biogeochemical dynamics, ecological data, and environmental processes, critical for understanding the dynamics relating to the biological, geological, and chemical components of the Earth's environment. Typically data produced, archived and analyzed is at a scale of multiple petabytes, which makes the discoverability of the data very challenging. Without proper metadata associated with the data, it is difficult to find the data you are looking for and equally difficult to use and understand the data. OME will allow data centers like the NGEE and ORNL DAAC to produce meaningful, high quality, standards-based, descriptive information about their data products in-turn helping with the data discoverability and interoperability. Useful Links: USGS OME: http://mercury.ornl.gov/OME/ NGEE OME: http://ngee-arctic.ornl.gov/ngeemetadata/ ARM OME: http://archive2.ornl.gov/armome/ Contact: Ranjeet Devarakonda (devarakondar@ornl.gov) References: [1] Federal Geographic Data Committee. Content standard for digital geospatial metadata. Federal Geographic Data Committee, 1998. [2] Devarakonda, Ranjeet, et al. "Mercury: reusable metadata management, data discovery and access system." Earth Science Informatics 3.1-2 (2010): 87-94. [3] Wilson, B. E., Palanisamy, G., Devarakonda, R., Rhyne, B. T., Lindsley, C., & Green, J. (2010). Mercury Toolset for Spatiotemporal Metadata. [4] Pouchard, L. C., Branstetter, M. L., Cook, R. B., Devarakonda, R., Green, J., Palanisamy, G., ... & Noy, N. F. (2013). A Linked Science investigation: enhancing climate change data discovery with semantic technologies. Earth science informatics, 6(3), 175-185.« less
Toward automatic finite element analysis

NASA Technical Reports Server (NTRS)

Kela, Ajay; Perucchio, Renato; Voelcker, Herbert

1987-01-01

Two problems must be solved if the finite element method is to become a reliable and affordable blackbox engineering tool. Finite element meshes must be generated automatically from computer aided design databases and mesh analysis must be made self-adaptive. The experimental system described solves both problems in 2-D through spatial and analytical substructuring techniques that are now being extended into 3-D.
A Bacterial Analysis Platform: An Integrated System for Analysing Bacterial Whole Genome Sequencing Data for Clinical Diagnostics and Surveillance.

PubMed

Thomsen, Martin Christen Frølund; Ahrenfeldt, Johanne; Cisneros, Jose Luis Bellod; Jurtz, Vanessa; Larsen, Mette Voldby; Hasman, Henrik; Aarestrup, Frank Møller; Lund, Ole

2016-01-01

Recent advances in whole genome sequencing have made the technology available for routine use in microbiological laboratories. However, a major obstacle for using this technology is the availability of simple and automatic bioinformatics tools. Based on previously published and already available web-based tools we developed a single pipeline for batch uploading of whole genome sequencing data from multiple bacterial isolates. The pipeline will automatically identify the bacterial species and, if applicable, assemble the genome, identify the multilocus sequence type, plasmids, virulence genes and antimicrobial resistance genes. A short printable report for each sample will be provided and an Excel spreadsheet containing all the metadata and a summary of the results for all submitted samples can be downloaded. The pipeline was benchmarked using datasets previously used to test the individual services. The reported results enable a rapid overview of the major results, and comparing that to the previously found results showed that the platform is reliable and able to correctly predict the species and find most of the expected genes automatically. In conclusion, a combined bioinformatics platform was developed and made publicly available, providing easy-to-use automated analysis of bacterial whole genome sequencing data. The platform may be of immediate relevance as a guide for investigators using whole genome sequencing for clinical diagnostics and surveillance. The platform is freely available at: https://cge.cbs.dtu.dk/services/CGEpipeline-1.1 and it is the intention that it will continue to be expanded with new features as these become available.
Automatic Incubator-type Temperature Control System for Brain Hypothermia Treatment

NASA Astrophysics Data System (ADS)

Gaohua, Lu; Wakamatsu, Hidetoshi

An automatic air-cooling incubator is proposed to replace the manual water-cooling blanket to control the brain tissue temperature for brain hypothermia treatment. Its feasibility is theoretically discussed as follows: First, an adult patient with the cooling incubator is modeled as a linear dynamical patient-incubator biothermal system. The patient is represented by an 18-compartment structure and described by its state equations. The air-cooling incubator provides almost same cooling effect as the water-cooling blanket, if a light breeze of speed around 3 m/s is circulated in the incubator. Then, in order to control the brain temperature automatically, an adaptive-optimal control algorithm is adopted, while the patient-blanket therapeutic system is considered as a reference model. Finally, the brain temperature of the patient-incubator biothermal system is controlled to follow up the given reference temperature course, in which an adaptive algorithm is confirmed useful for unknown environmental change and/or metabolic rate change of the patient in the incubating system. Thus, the present work ensures the development of the automatic air-cooling incubator for a better temperature regulation of the brain hypothermia treatment in ICU.
Functional relationships among monitoring performance: Subjective report of thought process and compromising states of awareness

NASA Technical Reports Server (NTRS)

Freeman, Frederick

1995-01-01

A biocybernetic system for use in adaptive automation was evaluated using EEG indices based on the beta, alpha, and theta bandwidths. Subjects performed a compensatory tracking task while their EEG was recorded and one of three engagement indices was derived: beta/(alpha + theta), beta/alpha, or 1/alpha. The task was switched between manual and automatic modes as a function of the subjects' level of engagement and whether they were under a positive or negative feedback condition. It was hypothesized that negative feedback would produce more switches between manual and automatic modes, and that the beta/(alpha + theta) index would produce the strongest effect. The results confirmed these hypotheses. There were no systematic changes in these effects over three 16-minute trials. Tracking performance was found to be better under negative feedback. An analysis of the different EEG bands under positive and negative feedback in manual and automatic modes found more beta power in the positive feedback/manual condition and less in the positive feedback/automatic condition. The opposite effect was observed for alpha and theta power. The implications of biocybernetic systems for adaptive automation are discussed.

Stop the Bleeding: the Development of a Tool to Streamline NASA Earth Science Metadata Curation Efforts

NASA Astrophysics Data System (ADS)

le Roux, J.; Baker, A.; Caltagirone, S.; Bugbee, K.

2017-12-01

The Common Metadata Repository (CMR) is a high-performance, high-quality repository for Earth science metadata records, and serves as the primary way to search NASA's growing 17.5 petabytes of Earth science data holdings. Released in 2015, CMR has the capability to support several different metadata standards already being utilized by NASA's combined network of Earth science data providers, or Distributed Active Archive Centers (DAACs). The Analysis and Review of CMR (ARC) Team located at Marshall Space Flight Center is working to improve the quality of records already in CMR with the goal of making records optimal for search and discovery. This effort entails a combination of automated and manual review, where each NASA record in CMR is checked for completeness, accuracy, and consistency. This effort is highly collaborative in nature, requiring communication and transparency of findings amongst NASA personnel, DAACs, the CMR team and other metadata curation teams. Through the evolution of this project it has become apparent that there is a need to document and report findings, as well as track metadata improvements in a more efficient manner. The ARC team has collaborated with Element 84 in order to develop a metadata curation tool to meet these needs. In this presentation, we will provide an overview of this metadata curation tool and its current capabilities. Challenges and future plans for the tool will also be discussed.
Conflict Adaptation Depends on Task Structure

ERIC Educational Resources Information Center

Akcay, Caglar; Hazeltine, Eliot

2008-01-01

The dependence of the Simon effect on the correspondence of the previous trial can be explained by the conflict-monitoring theory, which holds that a control system adjusts automatic activation from irrelevant stimulus information (conflict adaptation) on the basis of the congruency of the previous trial. The authors report on 4 experiments…
Convergence of an hp-Adaptive Finite Element Strategy in Two and Three Space-Dimensions

NASA Astrophysics Data System (ADS)

Bürg, Markus; Dörfler, Willy

2010-09-01

We show convergence of an automatic hp-adaptive refinement strategy for the finite element method on the elliptic boundary value problem. The strategy is a generalization of a refinement strategy proposed for one-dimensional situations to problems in two and three space-dimensions.
Dynamic Learner Profiling and Automatic Learner Classification for Adaptive E-Learning Environment

ERIC Educational Resources Information Center

Premlatha, K. R.; Dharani, B.; Geetha, T. V.

2016-01-01

E-learning allows learners individually to learn "anywhere, anytime" and offers immediate access to specific information. However, learners have different behaviors, learning styles, attitudes, and aptitudes, which affect their learning process, and therefore learning environments need to adapt according to these differences, so as to…
Social tagging in the life sciences: characterizing a new metadata resource for bioinformatics.

PubMed

Good, Benjamin M; Tennis, Joseph T; Wilkinson, Mark D

2009-09-25

Academic social tagging systems, such as Connotea and CiteULike, provide researchers with a means to organize personal collections of online references with keywords (tags) and to share these collections with others. One of the side-effects of the operation of these systems is the generation of large, publicly accessible metadata repositories describing the resources in the collections. In light of the well-known expansion of information in the life sciences and the need for metadata to enhance its value, these repositories present a potentially valuable new resource for application developers. Here we characterize the current contents of two scientifically relevant metadata repositories created through social tagging. This investigation helps to establish how such socially constructed metadata might be used as it stands currently and to suggest ways that new social tagging systems might be designed that would yield better aggregate products. We assessed the metadata that users of CiteULike and Connotea associated with citations in PubMed with the following metrics: coverage of the document space, density of metadata (tags) per document, rates of inter-annotator agreement, and rates of agreement with MeSH indexing. CiteULike and Connotea were very similar on all of the measurements. In comparison to PubMed, document coverage and per-document metadata density were much lower for the social tagging systems. Inter-annotator agreement within the social tagging systems and the agreement between the aggregated social tagging metadata and MeSH indexing was low though the latter could be increased through voting. The most promising uses of metadata from current academic social tagging repositories will be those that find ways to utilize the novel relationships between users, tags, and documents exposed through these systems. For more traditional kinds of indexing-based applications (such as keyword-based search) to benefit substantially from socially generated metadata in the life sciences, more documents need to be tagged and more tags are needed for each document. These issues may be addressed both by finding ways to attract more users to current systems and by creating new user interfaces that encourage more collectively useful individual tagging behaviour.
An automatic method to detect and track the glottal gap from high speed videoendoscopic images.

PubMed

Andrade-Miranda, Gustavo; Godino-Llorente, Juan I; Moro-Velázquez, Laureano; Gómez-García, Jorge Andrés

2015-10-29

The image-based analysis of the vocal folds vibration plays an important role in the diagnosis of voice disorders. The analysis is based not only on the direct observation of the video sequences, but also in an objective characterization of the phonation process by means of features extracted from the recorded images. However, such analysis is based on a previous accurate identification of the glottal gap, which is the most challenging step for a further automatic assessment of the vocal folds vibration. In this work, a complete framework to automatically segment and track the glottal area (or glottal gap) is proposed. The algorithm identifies a region of interest that is adapted along time, and combine active contours and watershed transform for the final delineation of the glottis and also an automatic procedure for synthesize different videokymograms is proposed. Thanks to the ROI implementation, our technique is robust to the camera shifting and also the objective test proved the effectiveness and performance of the approach in the most challenging scenarios that it is when exist an inappropriate closure of the vocal folds. The novelties of the proposed algorithm relies on the used of temporal information for identify an adaptive ROI and the use of watershed merging combined with active contours for the glottis delimitation. Additionally, an automatic procedure for synthesize multiline VKG by the identification of the glottal main axis is developed.
Review of Medical Image Classification using the Adaptive Neuro-Fuzzy Inference System

PubMed Central

Hosseini, Monireh Sheikh; Zekri, Maryam

2012-01-01

Image classification is an issue that utilizes image processing, pattern recognition and classification methods. Automatic medical image classification is a progressive area in image classification, and it is expected to be more developed in the future. Because of this fact, automatic diagnosis can assist pathologists by providing second opinions and reducing their workload. This paper reviews the application of the adaptive neuro-fuzzy inference system (ANFIS) as a classifier in medical image classification during the past 16 years. ANFIS is a fuzzy inference system (FIS) implemented in the framework of an adaptive fuzzy neural network. It combines the explicit knowledge representation of an FIS with the learning power of artificial neural networks. The objective of ANFIS is to integrate the best features of fuzzy systems and neural networks. A brief comparison with other classifiers, main advantages and drawbacks of this classifier are investigated. PMID:23493054
Autonomous beating rate adaptation in human stem cell-derived cardiomyocytes

PubMed Central

Eng, George; Lee, Benjamin W.; Protas, Lev; Gagliardi, Mark; Brown, Kristy; Kass, Robert S.; Keller, Gordon; Robinson, Richard B.; Vunjak-Novakovic, Gordana

2016-01-01

The therapeutic success of human stem cell-derived cardiomyocytes critically depends on their ability to respond to and integrate with the surrounding electromechanical environment. Currently, the immaturity of human cardiomyocytes derived from stem cells limits their utility for regenerative medicine and biological research. We hypothesize that biomimetic electrical signals regulate the intrinsic beating properties of cardiomyocytes. Here we show that electrical conditioning of human stem cell-derived cardiomyocytes in three-dimensional culture promotes cardiomyocyte maturation, alters their automaticity and enhances connexin expression. Cardiomyocytes adapt their autonomous beating rate to the frequency at which they were stimulated, an effect mediated by the emergence of a rapidly depolarizing cell population, and the expression of hERG. This rate-adaptive behaviour is long lasting and transferable to the surrounding cardiomyocytes. Thus, electrical conditioning may be used to promote cardiomyocyte maturation and establish their automaticity, with implications for cell-based reduction of arrhythmia during heart regeneration. PMID:26785135
Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers

NASA Astrophysics Data System (ADS)

Caballero Morales, Santiago Omar; Cox, Stephen J.

2009-12-01

Dysarthria is a motor speech disorder characterized by weakness, paralysis, or poor coordination of the muscles responsible for speech. Although automatic speech recognition (ASR) systems have been developed for disordered speech, factors such as low intelligibility and limited phonemic repertoire decrease speech recognition accuracy, making conventional speaker adaptation algorithms perform poorly on dysarthric speakers. In this work, rather than adapting the acoustic models, we model the errors made by the speaker and attempt to correct them. For this task, two techniques have been developed: (1) a set of "metamodels" that incorporate a model of the speaker's phonetic confusion matrix into the ASR process; (2) a cascade of weighted finite-state transducers at the confusion matrix, word, and language levels. Both techniques attempt to correct the errors made at the phonetic level and make use of a language model to find the best estimate of the correct word sequence. Our experiments show that both techniques outperform standard adaptation techniques.
Automatic image assessment from facial attributes

NASA Astrophysics Data System (ADS)

Ptucha, Raymond; Kloosterman, David; Mittelstaedt, Brian; Loui, Alexander

2013-03-01

Personal consumer photography collections often contain photos captured by numerous devices stored both locally and via online services. The task of gathering, organizing, and assembling still and video assets in preparation for sharing with others can be quite challenging. Current commercial photobook applications are mostly manual-based requiring significant user interactions. To assist the consumer in organizing these assets, we propose an automatic method to assign a fitness score to each asset, whereby the top scoring assets are used for product creation. Our method uses cues extracted from analyzing pixel data, metadata embedded in the file, as well as ancillary tags or online comments. When a face occurs in an image, its features have a dominating influence on both aesthetic and compositional properties of the displayed image. As such, this paper will emphasize the contributions faces have on affecting the overall fitness score of an image. To understand consumer preference, we conducted a psychophysical study that spanned 27 judges, 5,598 faces, and 2,550 images. Preferences on a per-face and per-image basis were independently gathered to train our classifiers. We describe how to use machine learning techniques to merge differing facial attributes into a single classifier. Our novel methods of facial weighting, fusion of facial attributes, and dimensionality reduction produce stateof- the-art results suitable for commercial applications.
From Sky to Earth: Data Science Methodology Transfer

NASA Astrophysics Data System (ADS)

Mahabal, Ashish A.; Crichton, Daniel; Djorgovski, S. G.; Law, Emily; Hughes, John S.

2017-06-01

We describe here the parallels in astronomy and earth science datasets, their analyses, and the opportunities for methodology transfer from astroinformatics to geoinformatics. Using example of hydrology, we emphasize how meta-data and ontologies are crucial in such an undertaking. Using the infrastructure being designed for EarthCube - the Virtual Observatory for the earth sciences - we discuss essential steps for better transfer of tools and techniques in the future e.g. domain adaptation. Finally we point out that it is never a one-way process and there is enough for astroinformatics to learn from geoinformatics as well.
Effortful versus automatic emotional processing in schizophrenia: Insights from a face-vignette task.

PubMed

Patrick, Regan E; Rastogi, Anuj; Christensen, Bruce K

2015-01-01

Adaptive emotional responding relies on dual automatic and effortful processing streams. Dual-stream models of schizophrenia (SCZ) posit a selective deficit in neural circuits that govern goal-directed, effortful processes versus reactive, automatic processes. This imbalance suggests that when patients are confronted with competing automatic and effortful emotional response cues, they will exhibit diminished effortful responding and intact, possibly elevated, automatic responding compared to controls. This prediction was evaluated using a modified version of the face-vignette task (FVT). Participants viewed emotional faces (automatic response cue) paired with vignettes (effortful response cue) that signalled a different emotion category and were instructed to discriminate the manifest emotion. Patients made less vignette and more face responses than controls. However, the relationship between group and FVT responding was moderated by IQ and reading comprehension ability. These results replicate and extend previous research and provide tentative support for abnormal conflict resolution between automatic and effortful emotional processing predicted by dual-stream models of SCZ.
A Metadata Element Set for Project Documentation

NASA Technical Reports Server (NTRS)

Hodge, Gail; Templeton, Clay; Allen, Robert B.

2003-01-01

Abstract NASA Goddard Space Flight Center is a large engineering enterprise with many projects. We describe our efforts to develop standard metadata sets across project documentation which we term the "Goddard Core". We also address broader issues for project management metadata.
Home media server content management

NASA Astrophysics Data System (ADS)

Tokmakoff, Andrew A.; van Vliet, Harry

2001-07-01

With the advent of set-top boxes, the convergence of TV (broadcasting) and PC (Internet) is set to enter the home environment. Currently, a great deal of activity is occurring in developing standards (TV-Anytime Forum) and devices (TiVo) for local storage on Home Media Servers (HMS). These devices lie at the heart of convergence of the triad: communications/networks - content/media - computing/software. Besides massive storage capacity and being a communications 'gateway', the home media server is characterised by the ability to handle metadata and software that provides an easy to use on-screen interface and intelligent search/content handling facilities. In this paper, we describe a research prototype HMS that is being developed within the GigaCE project at the Telematica Instituut . Our prototype demonstrates advanced search and retrieval (video browsing), adaptive user profiling and an innovative 3D component of the Electronic Program Guide (EPG) which represents online presence. We discuss the use of MPEG-7 for representing metadata, the use of MPEG-21 working draft standards for content identification, description and rights expression, and the use of HMS peer-to-peer content distribution approaches. Finally, we outline explorative user behaviour experiments that aim to investigate the effectiveness of the prototype HMS during development.
Antimicrobial Resistance Prediction in PATRIC and RAST.

PubMed

Davis, James J; Boisvert, Sébastien; Brettin, Thomas; Kenyon, Ronald W; Mao, Chunhong; Olson, Robert; Overbeek, Ross; Santerre, John; Shukla, Maulik; Wattam, Alice R; Will, Rebecca; Xia, Fangfang; Stevens, Rick

2016-06-14

The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.
Metadata for WIS and WIGOS: GAW Profile of ISO19115 and Draft WIGOS Core Metadata Standard

NASA Astrophysics Data System (ADS)

Klausen, Jörg; Howe, Brian

2014-05-01

The World Meteorological Organization (WMO) Integrated Global Observing System (WIGOS) is a key WMO priority to underpin all WMO Programs and new initiatives such as the Global Framework for Climate Services (GFCS). The development of the WIGOS Operational Information Resource (WIR) is central to the WIGOS Framework Implementation Plan (WIGOS-IP). The WIR shall provide information on WIGOS and its observing components, as well as requirements of WMO application areas. An important aspect is the description of the observational capabilities by way of structured metadata. The Global Atmosphere Watch is the WMO program addressing the chemical composition and selected physical properties of the atmosphere. Observational data are collected and archived by GAW World Data Centres (WDCs) and related data centres. The Task Team on GAW WDCs (ET-WDC) have developed a profile of the ISO19115 metadata standard that is compliant with the WMO Information System (WIS) specification for the WMO Core Metadata Profile v1.3. This profile is intended to harmonize certain aspects of the documentation of observations as well as the interoperability of the WDCs. The Inter-Commission-Group on WIGOS (ICG-WIGOS) has established the Task Team on WIGOS Metadata (TT-WMD) with representation of all WMO Technical Commissions and the objective to define the WIGOS Core Metadata. The result of this effort is a draft semantic standard comprising of a set of metadata classes that are considered to be of critical importance for the interpretation of observations relevant to WIGOS. The purpose of the presentation is to acquaint the audience with the standard and to solicit informal feed-back from experts in the various disciplines of meteorology and climatology. This feed-back will help ET-WDC and TT-WMD to refine the GAW metadata profile and the draft WIGOS metadata standard, thereby increasing their utility and acceptance.
SIPSMetGen: It's Not Just For Aircraft Data and ECS Anymore.

NASA Astrophysics Data System (ADS)

Schwab, M.

2015-12-01

The SIPSMetGen utility, developed for the NASA EOSDIS project, under the EED contract, simplified the creation of file level metadata for the ECS System. The utility has been enhanced for ease of use, efficiency, speed and increased flexibility. The SIPSMetGen utility was originally created as a means of generating file level spatial metadata for Operation IceBridge. The first version created only ODL metadata, specific for ingest into ECS. The core strength of the utility was, and continues to be, its ability to take complex shapes and patterns of data collection point clouds from aircraft flights and simplify them to a relatively simple concave hull geo-polygon. It has been found to be a useful and easy to use tool for creating file level metadata for many other missions, both aircraft and satellite. While the original version was useful it had its limitations. In 2014 Raytheon was tasked to make enhancements to SIPSMetGen, this resulted a new version of SIPSMetGen which can create ISO Compliant XML metadata; provides optimization and streamlining of the algorithm for creating the spatial metadata; a quicker runtime with more consistent results; a utility that can be configured to run multi-threaded on systems with multiple processors. The utility comes with a java based graphical user interface to aid in configuration and running of the utility. The enhanced SIPSMetGen allows more diverse data sets to be archived with file level metadata. The advantage of archiving data with file level metadata is that it makes it easier for data users, and scientists to find relevant data. File level metadata unlocks the power of existing archives and metadata repositories such as ECS and CMR and search and discovery utilities like Reverb and Earth Data Search. Current missions now using SIPSMetGen include: Aquarius, Measures, ARISE, and Nimbus.
Building a high level sample processing and quality assessment model for biogeochemical measurements: a case study from the ocean acidification community

NASA Astrophysics Data System (ADS)

Thomas, R.; Connell, D.; Spears, T.; Leadbetter, A.; Burger, E. F.

2016-12-01

The scientific literature heavily features small-scale studies with the impact of the results extrapolated to regional/global importance. There are on-going initiatives (e.g. OA-ICC, GOA-ON, GEOTRACES, EMODNet Chemistry) aiming to assemble regional to global-scale datasets that are available for trend or meta-analyses. Assessing the quality and comparability of these data requires information about the processing chain from "sampling to spreadsheet". This provenance information needs to be captured and readily available to assess data fitness for purpose. The NOAA Ocean Acidification metadata template was designed in consultation with domain experts for this reason; the core carbonate chemistry variables have 23-37 metadata fields each and for scientists generating these datasets there could appear to be an ever increasing amount of metadata expected to accompany a dataset. While this provenance metadata should be considered essential by those generating or using the data, for those discovering data there is a sliding scale between what is considered discovery metadata (title, abstract, contacts, etc.) versus usage metadata (methodology, environmental setup, lineage, etc.), the split depending on the intended use of data. As part of the OA-ICC's activities, the metadata fields from the NOAA template relevant to the sample processing chain and QA criteria have been factored to develop profiles for, and extensions to, the OM-JSON encoding supported by the PROV ontology. While this work started focused on carbonate chemistry variable specific metadata, the factorization could be applied within the O&M model across other disciplines such as trace metals or contaminants. In a linked data world with a suitable high level model for sample processing and QA available, tools and support can be provided to link reproducible units of metadata (e.g. the standard protocol for a variable as adopted by a community) and simplify the provision of metadata and subsequent discovery.
Metadata Creation, Management and Search System for your Scientific Data

NASA Astrophysics Data System (ADS)

Devarakonda, R.; Palanisamy, G.

2012-12-01

Mercury Search Systems is a set of tools for creating, searching, and retrieving of biogeochemical metadata. Mercury toolset provides orders of magnitude improvements in search speed, support for any metadata format, integration with Google Maps for spatial queries, multi-facetted type search, search suggestions, support for RSS (Really Simple Syndication) delivery of search results, and enhanced customization to meet the needs of the multiple projects that use Mercury. Mercury's metadata editor provides a easy way for creating metadata and Mercury's search interface provides a single portal to search for data and information contained in disparate data management systems, each of which may use any metadata format including FGDC, ISO-19115, Dublin-Core, Darwin-Core, DIF, ECHO, and EML. Mercury harvests metadata and key data from contributing project servers distributed around the world and builds a centralized index. The search interfaces then allow the users to perform a variety of fielded, spatial, and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury is being used more than 14 different projects across 4 federal agencies. It was originally developed for NASA, with continuing development funded by NASA, USGS, and DOE for a consortium of projects. Mercury search won the NASA's Earth Science Data Systems Software Reuse Award in 2008. References: R. Devarakonda, G. Palanisamy, B.E. Wilson, and J.M. Green, "Mercury: reusable metadata management data discovery and access system", Earth Science Informatics, vol. 3, no. 1, pp. 87-94, May 2010. R. Devarakonda, G. Palanisamy, J.M. Green, B.E. Wilson, "Data sharing and retrieval using OAI-PMH", Earth Science Informatics DOI: 10.1007/s12145-010-0073-0, (2010);
Adaptation of aeronautical engines to high altitude flying

NASA Technical Reports Server (NTRS)

Kutzbach, K

1923-01-01

Issues and techniques relative to the adaptation of aircraft engines to high altitude flight are discussed. Covered here are the limits of engine output, modifications and characteristics of high altitude engines, the influence of air density on the proportions of fuel mixtures, methods of varying the proportions of fuel mixtures, the automatic prevention of fuel waste, and the design and application of air pressure regulators to high altitude flying. Summary: 1. Limits of engine output. 2. High altitude engines. 3. Influence of air density on proportions of mixture. 4. Methods of varying proportions of mixture. 5. Automatic prevention of fuel waste. 6. Design and application of air pressure regulators to high altitude flying.

Lessons Learned in over Two Decades of GPS/GNSS Data Center Support

NASA Astrophysics Data System (ADS)

Boler, F. M.; Estey, L. H.; Meertens, C. M.; Maggert, D.

2014-12-01

The UNAVCO Data Center in Boulder, Colorado, curates, archives, and distributes geodesy data and products, mainly GPS/GNSS data from 3,000 permanent stations and 10,000 campaign sites around the globe. Although now having core support from NSF and NASA, the archive began around 1992 as a grass-roots effort of a few UNAVCO staff and community members to preserve data going back to 1986. Open access to this data is generally desired, but the Data Center in fact operates under an evolving suite of data access policies ranging from open access to nondisclosure for special cases. Key to processing this data is having the correct equipment metadata; reliably obtaining this metadata continues to be a challenge, in spite of modern cyberinfrastructure and tools, mostly due to human errors or lack of consistent operator training. New metadata problems surface when trying to design and publish modern Digital Object Identifiers for data sets where PIs, funding sources, and historical project names now need to be corrected and verified for data sets going back almost three decades. Originally, the data was GPS-only based on three signals on two carrier frequencies. Modern GNSS covers GPS modernization (three more signals and one additional carrier) as well as open signals and carriers of additional systems such as GLONASS, Galileo, BeiDou, and QZSS, requiring ongoing adaptive strategies to assess the quality of modern datasets. Also, new scientific uses of these data benefit from higher data rates than was needed for early tectonic applications. In addition, there has been a migration from episodic campaign sites (hence sparse data) to continuously operating stations (hence dense data) over the last two decades. All of these factors make it difficult to realistically plan even simple data center functions such as on-line storage capacity.
ISA-TAB-Nano: a specification for sharing nanomaterial research data in spreadsheet-based format.

PubMed

Thomas, Dennis G; Gaheen, Sharon; Harper, Stacey L; Fritts, Martin; Klaessig, Fred; Hahn-Dantona, Elizabeth; Paik, David; Pan, Sue; Stafford, Grace A; Freund, Elaine T; Klemm, Juli D; Baker, Nathan A

2013-01-14

The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption.
ISA-TAB-Nano: A Specification for Sharing Nanomaterial Research Data in Spreadsheet-based Format

PubMed Central

2013-01-01

Background and motivation The high-throughput genomics communities have been successfully using standardized spreadsheet-based formats to capture and share data within labs and among public repositories. The nanomedicine community has yet to adopt similar standards to share the diverse and multi-dimensional types of data (including metadata) pertaining to the description and characterization of nanomaterials. Owing to the lack of standardization in representing and sharing nanomaterial data, most of the data currently shared via publications and data resources are incomplete, poorly-integrated, and not suitable for meaningful interpretation and re-use of the data. Specifically, in its current state, data cannot be effectively utilized for the development of predictive models that will inform the rational design of nanomaterials. Results We have developed a specification called ISA-TAB-Nano, which comprises four spreadsheet-based file formats for representing and integrating various types of nanomaterial data. Three file formats (Investigation, Study, and Assay files) have been adapted from the established ISA-TAB specification; while the Material file format was developed de novo to more readily describe the complexity of nanomaterials and associated small molecules. In this paper, we have discussed the main features of each file format and how to use them for sharing nanomaterial descriptions and assay metadata. Conclusion The ISA-TAB-Nano file formats provide a general and flexible framework to record and integrate nanomaterial descriptions, assay data (metadata and endpoint measurements) and protocol information. Like ISA-TAB, ISA-TAB-Nano supports the use of ontology terms to promote standardized descriptions and to facilitate search and integration of the data. The ISA-TAB-Nano specification has been submitted as an ASTM work item to obtain community feedback and to provide a nanotechnology data-sharing standard for public development and adoption. PMID:23311978
Master Metadata Repository and Metadata-Management System

NASA Technical Reports Server (NTRS)

Armstrong, Edward; Reed, Nate; Zhang, Wen

2007-01-01

A master metadata repository (MMR) software system manages the storage and searching of metadata pertaining to data from national and international satellite sources of the Global Ocean Data Assimilation Experiment (GODAE) High Resolution Sea Surface Temperature Pilot Project [GHRSSTPP]. These sources produce a total of hundreds of data files daily, each file classified as one of more than ten data products representing global sea-surface temperatures. The MMR is a relational database wherein the metadata are divided into granulelevel records [denoted file records (FRs)] for individual satellite files and collection-level records [denoted data set descriptions (DSDs)] that describe metadata common to all the files from a specific data product. FRs and DSDs adhere to the NASA Directory Interchange Format (DIF). The FRs and DSDs are contained in separate subdatabases linked by a common field. The MMR is configured in MySQL database software with custom Practical Extraction and Reporting Language (PERL) programs to validate and ingest the metadata records. The database contents are converted into the Federal Geographic Data Committee (FGDC) standard format by use of the Extensible Markup Language (XML). A Web interface enables users to search for availability of data from all sources.
WE-A-17A-06: Evaluation of An Automatic Interstitial Catheter Digitization Algorithm That Reduces Treatment Planning Time and Provide Means for Adaptive Re-Planning in HDR Brachytherapy of Gynecologic Cancers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dise, J; Liang, X; Lin, L

Purpose: To evaluate an automatic interstitial catheter digitization algorithm that reduces treatment planning time and provide means for adaptive re-planning in HDR Brachytherapy of Gynecologic Cancers. Methods: The semi-automatic catheter digitization tool utilizes a region growing algorithm in conjunction with a spline model of the catheters. The CT images were first pre-processed to enhance the contrast between the catheters and soft tissue. Several seed locations were selected in each catheter for the region growing algorithm. The spline model of the catheters assisted in the region growing by preventing inter-catheter cross-over caused by air or metal artifacts. Source dwell positions frommore » day one CT scans were applied to subsequent CTs and forward calculated using the automatically digitized catheter positions. This method was applied to 10 patients who had received HDR interstitial brachytherapy on an IRB approved image-guided radiation therapy protocol. The prescribed dose was 18.75 or 20 Gy delivered in 5 fractions, twice daily, over 3 consecutive days. Dosimetric comparisons were made between automatic and manual digitization on day two CTs. Results: The region growing algorithm, assisted by the spline model of the catheters, was able to digitize all catheters. The difference between automatic and manually digitized positions was 0.8±0.3 mm. The digitization time ranged from 34 minutes to 43 minutes with a mean digitization time of 37 minutes. The bulk of the time was spent on manual selection of initial seed positions and spline parameter adjustments. There was no significance difference in dosimetric parameters between the automatic and manually digitized plans. D90% to the CTV was 91.5±4.4% for the manual digitization versus 91.4±4.4% for the automatic digitization (p=0.56). Conclusion: A region growing algorithm was developed to semi-automatically digitize interstitial catheters in HDR brachytherapy using the Syed-Neblett template. This automatic digitization tool was shown to be accurate compared to manual digitization.« less
Binding of motion and colour is early and automatic.

PubMed

Blaser, Erik; Papathomas, Thomas; Vidnyánszky, Zoltán

2005-04-01

At what stages of the human visual hierarchy different features are bound together, and whether this binding requires attention, is still highly debated. We used a colour-contingent motion after-effect (CCMAE) to study the binding of colour and motion signals. The logic of our approach was as follows: if CCMAEs can be evoked by targeted adaptation of early motion processing stages, without allowing for feedback from higher motion integration stages, then this would support our hypothesis that colour and motion are bound automatically on the basis of spatiotemporally local information. Our results show for the first time that CCMAE's can be evoked by adaptation to a locally paired opposite-motion dot display, a stimulus that, importantly, is known to trigger direction-specific responses in the primary visual cortex yet results in strong inhibition of the directional responses in area MT of macaques as well as in area MT+ in humans and, indeed, is perceived only as motionless flicker. The magnitude of the CCMAE in the locally paired condition was not significantly different from control conditions where the different directions were spatiotemporally separated (i.e. not locally paired) and therefore perceived as two moving fields. These findings provide evidence that adaptation at an early, local motion stage, and only adaptation at this stage, underlies this CCMAE, which in turn implies that spatiotemporally coincident colour and motion signals are bound automatically, most probably as early as cortical area V1, even when the association between colour and motion is perceptually inaccessible.
Phase coherence adaptive processor for automatic signal detection and identification

NASA Astrophysics Data System (ADS)

Wagstaff, Ronald A.

2006-05-01

A continuously adapting acoustic signal processor with an automatic detection/decision aid is presented. Its purpose is to preserve the signals of tactical interest, and filter out other signals and noise. It utilizes single sensor or beamformed spectral data and transforms the signal and noise phase angles into "aligned phase angles" (APA). The APA increase the phase temporal coherence of signals and leave the noise incoherent. Coherence thresholds are set, which are representative of the type of source "threat vehicle" and the geographic area or volume in which it is operating. These thresholds separate signals, based on the "quality" of their APA coherence. An example is presented in which signals from a submerged source in the ocean are preserved, while clutter signals from ships and noise are entirely eliminated. Furthermore, the "signals of interest" were identified by the processor's automatic detection aid. Similar performance is expected for air and ground vehicles. The processor's equations are formulated in such a manner that they can be tuned to eliminate noise and exploit signal, based on the "quality" of their APA temporal coherence. The mathematical formulation for this processor is presented, including the method by which the processor continuously self-adapts. Results show nearly complete elimination of noise, with only the selected category of signals remaining, and accompanying enhancements in spectral and spatial resolution. In most cases, the concept of signal-to-noise ratio looses significance, and "adaptive automated /decision aid" is more relevant.
Brady's Geothermal Field Nodal Seismometers Metadata

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lesley Parker

Metadata for the nodal seismometer array deployed at the POROTOMO's Natural Laboratory in Brady Hot Spring, Nevada during the March 2016 testing. Metadata includes location and timing for each instrument as well as file lists of data to be uploaded in a separate submission.
Extraction of CT dose information from DICOM metadata: automated Matlab-based approach.

PubMed

Dave, Jaydev K; Gingold, Eric L

2013-01-01

The purpose of this study was to extract exposure parameters and dose-relevant indexes of CT examinations from information embedded in DICOM metadata. DICOM dose report files were identified and retrieved from a PACS. An automated software program was used to extract from these files information from the structured elements in the DICOM metadata relevant to exposure. Extracting information from DICOM metadata eliminated potential errors inherent in techniques based on optical character recognition, yielding 100% accuracy.
Trends in the Evolution of the Public Web, 1998-2002; The Fedora Project: An Open-source Digital Object Repository Management System; State of the Dublin Core Metadata Initiative, April 2003; Preservation Metadata; How Many People Search the ERIC Database Each Day?

ERIC Educational Resources Information Center

O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.

2003-01-01

Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
Update on the Center for Engineering Strong Motion Data

NASA Astrophysics Data System (ADS)

Haddadi, H. R.; Shakal, A. F.; Stephens, C. D.; Oppenheimer, D. H.; Huang, M.; Leith, W. S.; Parrish, J. G.; Savage, W. U.

2010-12-01

The U.S. Geological Survey (USGS) and the California Geological Survey (CGS) established the Center for Engineering Strong-Motion Data (CESMD, Center) to provide a single access point for earthquake strong-motion records and station metadata from the U.S. and international strong-motion programs. The Center has operational facilities in Sacramento and Menlo Park, California, to receive, process, and disseminate records through the CESMD web site at www.strongmotioncenter.org. The Center currently is in the process of transitioning the COSMOS Virtual Data Center (VDC) to integrate its functions with those of the CESMD for improved efficiency of operations, and to provide all users with a more convenient one-stop portal to both U.S. and important international strong-motion records. The Center is working with COSMOS and international and U.S. data providers to improve the completeness of site and station information, which are needed to most effectively employ the recorded data. The goal of all these and other new developments is to continually improve access by the earthquake engineering community to strong-motion data and metadata world-wide. The CESMD and its Virtual Data Center (VDC) provide tools to map earthquakes and recording stations, to search raw and processed data, to view time histories and spectral plots, to convert data files formats, and to download data and a variety of information. The VDC is now being upgraded to convert the strong-motion data files from different seismic networks into a common standard tagged format in order to facilitate importing earthquake records and station metadata to the CESMD database. An important new feature being developed is the automatic posting of Internet Quick Reports at the CESMD web site. This feature will allow users, and emergency responders in particular, to view strong-motion waveforms and download records within a few minutes after an earthquake occurs. Currently the CESMD and its Virtual Data Center provide selected strong-motion records from 17 countries. The Center has proved to be significantly useful for providing data to scientists, engineers, policy makers, and emergency response teams around the world.
Average glandular dose in paired digital mammography and digital breast tomosynthesis acquisitions in a population based screening program: effects of measuring breast density, air kerma and beam quality

NASA Astrophysics Data System (ADS)

Helge Østerås, Bjørn; Skaane, Per; Gullien, Randi; Catrine Trægde Martinsen, Anne

2018-02-01

The main purpose was to compare average glandular dose (AGD) for same-compression digital mammography (DM) and digital breast tomosynthesis (DBT) acquisitions in a population based screening program, with and without breast density stratification, as determined by automatically calculated breast density (Quantra™). Secondary, to compare AGD estimates based on measured breast density, air kerma and half value layer (HVL) to DICOM metadata based estimates. AGD was estimated for 3819 women participating in the screening trial. All received craniocaudal and mediolateral oblique views of each breasts with paired DM and DBT acquisitions. Exposure parameters were extracted from DICOM metadata. Air kerma and HVL were measured for all beam qualities used to acquire the mammograms. Volumetric breast density was estimated using Quantra™. AGD was estimated using the Dance model. AGD reported directly from the DICOM metadata was also assessed. Mean AGD was 1.74 and 2.10 mGy for DM and DBT, respectively. Mean DBT/DM AGD ratio was 1.24. For fatty breasts: mean AGD was 1.74 and 2.27 mGy for DM and DBT, respectively. For dense breasts: mean AGD was 1.73 and 1.79 mGy, for DM and DBT, respectively. For breasts of similar thickness, dense breasts had higher AGD for DM and similar AGD for DBT. The DBT/DM dose ratio was substantially lower for dense compared to fatty breasts (1.08 versus 1.33). The average c-factor was 1.16. Using previously published polynomials to estimate glandularity from thickness underestimated the c-factor by 5.9% on average. Mean AGD error between estimates based on measurements (air kerma and HVL) versus DICOM header data was 3.8%, but for one mammography unit as high as 7.9%. Mean error of using the AGD value reported in the DICOM header was 10.7 and 13.3%, respectively. Thus, measurement of breast density, radiation dose and beam quality can substantially affect AGD estimates.
PDF text classification to leverage information extraction from publication reports.

PubMed

Bui, Duy Duc An; Del Fiol, Guilherme; Jonnalagadda, Siddhartha

2016-06-01

Data extraction from original study reports is a time-consuming, error-prone process in systematic review development. Information extraction (IE) systems have the potential to assist humans in the extraction task, however majority of IE systems were not designed to work on Portable Document Format (PDF) document, an important and common extraction source for systematic review. In a PDF document, narrative content is often mixed with publication metadata or semi-structured text, which add challenges to the underlining natural language processing algorithm. Our goal is to categorize PDF texts for strategic use by IE systems. We used an open-source tool to extract raw texts from a PDF document and developed a text classification algorithm that follows a multi-pass sieve framework to automatically classify PDF text snippets (for brevity, texts) into TITLE, ABSTRACT, BODYTEXT, SEMISTRUCTURE, and METADATA categories. To validate the algorithm, we developed a gold standard of PDF reports that were included in the development of previous systematic reviews by the Cochrane Collaboration. In a two-step procedure, we evaluated (1) classification performance, and compared it with machine learning classifier, and (2) the effects of the algorithm on an IE system that extracts clinical outcome mentions. The multi-pass sieve algorithm achieved an accuracy of 92.6%, which was 9.7% (p<0.001) higher than the best performing machine learning classifier that used a logistic regression algorithm. F-measure improvements were observed in the classification of TITLE (+15.6%), ABSTRACT (+54.2%), BODYTEXT (+3.7%), SEMISTRUCTURE (+34%), and MEDADATA (+14.2%). In addition, use of the algorithm to filter semi-structured texts and publication metadata improved performance of the outcome extraction system (F-measure +4.1%, p=0.002). It also reduced of number of sentences to be processed by 44.9% (p<0.001), which corresponds to a processing time reduction of 50% (p=0.005). The rule-based multi-pass sieve framework can be used effectively in categorizing texts extracted from PDF documents. Text classification is an important prerequisite step to leverage information extraction from PDF documents. Copyright © 2016 Elsevier Inc. All rights reserved.
A data delivery system for IMOS, the Australian Integrated Marine Observing System

NASA Astrophysics Data System (ADS)

Proctor, R.; Roberts, K.; Ward, B. J.

2010-09-01

The Integrated Marine Observing System (IMOS, www.imos.org.au), an AUD 150 m 7-year project (2007-2013), is a distributed set of equipment and data-information services which, among many applications, collectively contribute to meeting the needs of marine climate research in Australia. The observing system provides data in the open oceans around Australia out to a few thousand kilometres as well as the coastal oceans through 11 facilities which effectively observe and measure the 4-dimensional ocean variability, and the physical and biological response of coastal and shelf seas around Australia. Through a national science rationale IMOS is organized as five regional nodes (Western Australia - WAIMOS, South Australian - SAIMOS, Tasmania - TASIMOS, New SouthWales - NSWIMOS and Queensland - QIMOS) surrounded by an oceanic node (Blue Water and Climate). Operationally IMOS is organized as 11 facilities (Argo Australia, Ships of Opportunity, Southern Ocean Automated Time Series Observations, Australian National Facility for Ocean Gliders, Autonomous Underwater Vehicle Facility, Australian National Mooring Network, Australian Coastal Ocean Radar Network, Australian Acoustic Tagging and Monitoring System, Facility for Automated Intelligent Monitoring of Marine Systems, eMarine Information Infrastructure and Satellite Remote Sensing) delivering data. IMOS data is freely available to the public. The data, a combination of near real-time and delayed mode, are made available to researchers through the electronic Marine Information Infrastructure (eMII). eMII utilises the Australian Academic Research Network (AARNET) to support a distributed database on OPeNDAP/THREDDS servers hosted by regional computing centres. IMOS instruments are described through the OGC Specification SensorML and where-ever possible data is in CF compliant netCDF format. Metadata, conforming to standard ISO 19115, is automatically harvested from the netCDF files and the metadata records catalogued in the OGC GeoNetwork Metadata Entry and Search Tool (MEST). Data discovery, access and download occur via web services through the IMOS Ocean Portal (http://imos.aodn.org.au) and tools for the display and integration of near real-time data are in development.
Data publication and sharing using the SciDrive service

NASA Astrophysics Data System (ADS)

Mishin, Dmitry; Medvedev, D.; Szalay, A. S.; Plante, R. L.

2014-01-01

Despite the last years progress in scientific data storage, still remains the problem of public data storage and sharing system for relatively small scientific datasets. These are collections forming the “long tail” of power log datasets distribution. The aggregated size of the long tail data is comparable to the size of all data collections from large archives, and the value of data is significant. The SciDrive project's main goal is providing the scientific community with a place to reliably and freely store such data and provide access to it to broad scientific community. The primary target audience of the project is astoromy community, and it will be extended to other fields. We're aiming to create a simple way of publishing a dataset, which can be then shared with other people. Data owner controls the permissions to modify and access the data and can assign a group of users or open the access to everyone. The data contained in the dataset will be automaticaly recognized by a background process. Known data formats will be extracted according to the user's settings. Currently tabular data can be automatically extracted to the user's MyDB table where user can make SQL queries to the dataset and merge it with other public CasJobs resources. Other data formats can be processed using a set of plugins that upload the data or metadata to user-defined side services. The current implementation targets some of the data formats commonly used by the astronomy communities, including FITS, ASCII and Excel tables, TIFF images, and YT simulations data archives. Along with generic metadata, format-specific metadata is also processed. For example, basic information about celestial objects is extracted from FITS files and TIFF images, if present. A 100TB implementation has just been put into production at Johns Hopkins University. The system features public data storage REST service supporting VOSpace 2.0 and Dropbox protocols, HTML5 web portal, command-line client and Java standalone client to synchronize a local folder with the remote storage. We use VAO SSO (Single Sign On) service from NCSA for users authentication that provides free registration for everyone.
Average glandular dose in paired digital mammography and digital breast tomosynthesis acquisitions in a population based screening program: effects of measuring breast density, air kerma and beam quality.

PubMed

Østerås, Bjørn Helge; Skaane, Per; Gullien, Randi; Martinsen, Anne Catrine Trægde

2018-01-25

The main purpose was to compare average glandular dose (AGD) for same-compression digital mammography (DM) and digital breast tomosynthesis (DBT) acquisitions in a population based screening program, with and without breast density stratification, as determined by automatically calculated breast density (Quantra ™ ). Secondary, to compare AGD estimates based on measured breast density, air kerma and half value layer (HVL) to DICOM metadata based estimates. AGD was estimated for 3819 women participating in the screening trial. All received craniocaudal and mediolateral oblique views of each breasts with paired DM and DBT acquisitions. Exposure parameters were extracted from DICOM metadata. Air kerma and HVL were measured for all beam qualities used to acquire the mammograms. Volumetric breast density was estimated using Quantra ™ . AGD was estimated using the Dance model. AGD reported directly from the DICOM metadata was also assessed. Mean AGD was 1.74 and 2.10 mGy for DM and DBT, respectively. Mean DBT/DM AGD ratio was 1.24. For fatty breasts: mean AGD was 1.74 and 2.27 mGy for DM and DBT, respectively. For dense breasts: mean AGD was 1.73 and 1.79 mGy, for DM and DBT, respectively. For breasts of similar thickness, dense breasts had higher AGD for DM and similar AGD for DBT. The DBT/DM dose ratio was substantially lower for dense compared to fatty breasts (1.08 versus 1.33). The average c-factor was 1.16. Using previously published polynomials to estimate glandularity from thickness underestimated the c-factor by 5.9% on average. Mean AGD error between estimates based on measurements (air kerma and HVL) versus DICOM header data was 3.8%, but for one mammography unit as high as 7.9%. Mean error of using the AGD value reported in the DICOM header was 10.7 and 13.3%, respectively. Thus, measurement of breast density, radiation dose and beam quality can substantially affect AGD estimates.
The RBV metadata catalog

NASA Astrophysics Data System (ADS)

Andre, Francois; Fleury, Laurence; Gaillardet, Jerome; Nord, Guillaume

2015-04-01

RBV (Réseau des Bassins Versants) is a French initiative to consolidate the national efforts made by more than 15 elementary observatories funded by various research institutions (CNRS, INRA, IRD, IRSTEA, Universities) that study river and drainage basins. The RBV Metadata Catalogue aims at giving an unified vision of the work produced by every observatory to both the members of the RBV network and any external person interested by this domain of research. Another goal is to share this information with other existing metadata portals. Metadata management is heterogeneous among observatories ranging from absence to mature harvestable catalogues. Here, we would like to explain the strategy used to design a state of the art catalogue facing this situation. Main features are as follows : - Multiple input methods: Metadata records in the catalog can either be entered with the graphical user interface, harvested from an existing catalogue or imported from information system through simplified web services. - Hierarchical levels: Metadata records may describe either an observatory, one of its experimental site or a single dataset produced by one instrument. - Multilingualism: Metadata can be easily entered in several configurable languages. - Compliance to standards : the backoffice part of the catalogue is based on a CSW metadata server (Geosource) which ensures ISO19115 compatibility and the ability of being harvested (globally or partially). On going tasks focus on the use of SKOS thesaurus and SensorML description of the sensors. - Ergonomy : The user interface is built with the GWT Framework to offer a rich client application with a fully ajaxified navigation. - Source code sharing : The work has led to the development of reusable components which can be used to quickly create new metadata forms in other GWT applications You can visit the catalogue (http://portailrbv.sedoo.fr/) or contact us by email rbv@sedoo.fr.
Traceability, reproducibility and wiki-exploration for “à-la-carte” reconstructions of genome-scale metabolic models

PubMed Central

Got, Jeanne; Cortés, María Paz; Maass, Alejandro

2018-01-01

Genome-scale metabolic models have become the tool of choice for the global analysis of microorganism metabolism, and their reconstruction has attained high standards of quality and reliability. Improvements in this area have been accompanied by the development of some major platforms and databases, and an explosion of individual bioinformatics methods. Consequently, many recent models result from “à la carte” pipelines, combining the use of platforms, individual tools and biological expertise to enhance the quality of the reconstruction. Although very useful, introducing heterogeneous tools, that hardly interact with each other, causes loss of traceability and reproducibility in the reconstruction process. This represents a real obstacle, especially when considering less studied species whose metabolic reconstruction can greatly benefit from the comparison to good quality models of related organisms. This work proposes an adaptable workspace, AuReMe, for sustainable reconstructions or improvements of genome-scale metabolic models involving personalized pipelines. At each step, relevant information related to the modifications brought to the model by a method is stored. This ensures that the process is reproducible and documented regardless of the combination of tools used. Additionally, the workspace establishes a way to browse metabolic models and their metadata through the automatic generation of ad-hoc local wikis dedicated to monitoring and facilitating the process of reconstruction. AuReMe supports exploration and semantic query based on RDF databases. We illustrate how this workspace allowed handling, in an integrated way, the metabolic reconstructions of non-model organisms such as an extremophile bacterium or eukaryote algae. Among relevant applications, the latter reconstruction led to putative evolutionary insights of a metabolic pathway. PMID:29791443
Effects of single cortisol administrations on human affect reviewed: Coping with stress through adaptive regulation of automatic cognitive processing.

PubMed

Putman, Peter; Roelofs, Karin

2011-05-01

The human stress hormone cortisol may facilitate effective coping after psychological stress. In apparent agreement, administration of cortisol has been demonstrated to reduce fear in response to stressors. For anxious patients with phobias or posttraumatic stress disorder this has been ascribed to hypothetical inhibition of retrieval of traumatic memories. However, such stress-protective effects may also work via adaptive regulation of early cognitive processing of threatening information from the environment. This paper selectively reviews the available literature on effects of single cortisol administrations on affect and early cognitive processing of affectively significant information. The concluded working hypothesis is that immediate effects of high concentration of cortisol may facilitate stress-coping via inhibition of automatic processing of goal-irrelevant threatening information and through increased automatic approach-avoidance responses in early emotional processing. Limitations in the existing literature and suggestions for future directions are briefly discussed. Copyright © 2010 Elsevier Ltd. All rights reserved.
Automatic knee cartilage delineation using inheritable segmentation

NASA Astrophysics Data System (ADS)

Dries, Sebastian P. M.; Pekar, Vladimir; Bystrov, Daniel; Heese, Harald S.; Blaffert, Thomas; Bos, Clemens; van Muiswinkel, Arianne M. C.

2008-03-01

We present a fully automatic method for segmentation of knee joint cartilage from fat suppressed MRI. The method first applies 3-D model-based segmentation technology, which allows to reliably segment the femur, patella, and tibia by iterative adaptation of the model according to image gradients. Thin plate spline interpolation is used in the next step to position deformable cartilage models for each of the three bones with reference to the segmented bone models. After initialization, the cartilage models are fine adjusted by automatic iterative adaptation to image data based on gray value gradients. The method has been validated on a collection of 8 (3 left, 5 right) fat suppressed datasets and demonstrated the sensitivity of 83+/-6% compared to manual segmentation on a per voxel basis as primary endpoint. Gross cartilage volume measurement yielded an average error of 9+/-7% as secondary endpoint. For cartilage being a thin structure, already small deviations in distance result in large errors on a per voxel basis, rendering the primary endpoint a hard criterion.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.