open source repository: Topics by Science.gov

Sample records for open source repository

Completeness and overlap in open access systems: Search engines, aggregate institutional repositories and physics-related open sources.

PubMed

Tsay, Ming-Yueh; Wu, Tai-Luan; Tseng, Ling-Li

2017-01-01

This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System). The 2001-2013 Nobel Laureates in Physics served as the sample. Bibliographic records of their publications were retrieved and downloaded from each system, and a computer program was developed to perform the analytical tasks of sorting, comparison, elimination, aggregation and statistical calculations. Quantitative analyses and cross-referencing were performed to determine the completeness and overlap of the system coverage of the six open access systems. The results may enable scholars to select an appropriate open access system as an efficient scholarly communication channel, and academic institutions may build institutional repositories or independently create citation index systems in the future. Suggestions on indicators and tools for academic assessment are presented based on the comprehensiveness assessment of each system.
An ontology based information system for the management of institutional repository's collections

NASA Astrophysics Data System (ADS)

Tsolakidis, A.; Kakoulidis, P.; Skourlas, C.

2015-02-01

In this paper we discuss a simple methodological approach to create, and customize institutional repositories for the domain of the technological education. The use of the open source software platform of DSpace is proposed to build up the repository application and provide access to digital resources including research papers, dissertations, administrative documents, educational material, etc. Also the use of owl ontologies is proposed for indexing and accessing the various, heterogeneous items stored in the repository. Customization and operation of a platform for the selection and use of terms or parts of similar existing owl ontologies is also described. This platform could be based on the open source software Protégé that supports owl, is widely used, and also supports visualization, SPARQL etc. The combined use of the owl platform and the DSpace repository form a basis for creating customized ontologies, accommodating the semantic metadata of items and facilitating searching.
Completeness and overlap in open access systems: Search engines, aggregate institutional repositories and physics-related open sources

PubMed Central

Wu, Tai-luan; Tseng, Ling-li

2017-01-01

This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System). The 2001–2013 Nobel Laureates in Physics served as the sample. Bibliographic records of their publications were retrieved and downloaded from each system, and a computer program was developed to perform the analytical tasks of sorting, comparison, elimination, aggregation and statistical calculations. Quantitative analyses and cross-referencing were performed to determine the completeness and overlap of the system coverage of the six open access systems. The results may enable scholars to select an appropriate open access system as an efficient scholarly communication channel, and academic institutions may build institutional repositories or independently create citation index systems in the future. Suggestions on indicators and tools for academic assessment are presented based on the comprehensiveness assessment of each system. PMID:29267327
Personalized reminiscence therapy M-health application for patients living with dementia: Innovating using open source code repository.

PubMed

Zhang, Melvyn W B; Ho, Roger C M

2017-01-01

Dementia is known to be an illness which brings forth marked disability amongst the elderly individuals. At times, patients living with dementia do also experience non-cognitive symptoms, and these symptoms include that of hallucinations, delusional beliefs as well as emotional liability, sexualized behaviours and aggression. According to the National Institute of Clinical Excellence (NICE) guidelines, non-pharmacological techniques are typically the first-line option prior to the consideration of adjuvant pharmacological options. Reminiscence and music therapy are thus viable options. Lazar et al. [3] previously performed a systematic review with regards to the utilization of technology to delivery reminiscence based therapy to individuals who are living with dementia and has highlighted that technology does have benefits in the delivery of reminiscence therapy. However, to date, there has been a paucity of M-health innovations in this area. In addition, most of the current innovations are not personalized for each of the person living with Dementia. Prior research has highlighted the utility for open source repository in bioinformatics study. The authors hoped to explain how they managed to tap upon and make use of open source repository in the development of a personalized M-health reminiscence therapy innovation for patients living with dementia. The availability of open source code repository has changed the way healthcare professionals and developers develop smartphone applications today. Conventionally, a long iterative process is needed in the development of native application, mainly because of the need for native programming and coding, especially so if the application needs to have interactive features or features that could be personalized. Such repository enables the rapid and cost effective development of application. Moreover, developers are also able to further innovate, as less time is spend in the iterative process.
A perspective on the proliferation risks of plutonium mines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lyman, E.S.

1996-05-01

The program of geologic disposal of spent fuel and other plutonium-containing materials is increasingly becoming the target of criticism by individuals who argue that in the future, repositories may become low-cost sources of fissile material for nuclear weapons. This paper attempts to outline a consistent framework for analyzing the proliferation risks of these so-called {open_quotes}plutonium mines{close_quotes} and putting them into perspective. First, it is emphasized that the attractiveness of plutonium in a repository as a source of weapons material depends on its accessibility relative to other sources of fissile material. Then, the notion of a {open_quotes}material production standard{close_quotes} (MPS) ismore » proposed: namely, that the proliferation risks posed by geologic disposal will be acceptable if one can demonstrate, under a number of reasonable scenarios, that the recovery of plutonium from a repository is likely to be as difficult as new production of fissile material. A preliminary analysis suggests that the range of circumstances under which current mined repository concepts would fail to meet this standard is fairly narrow. Nevertheless, a broad application of the MPS may impose severe restrictions on repository design. In this context, the relationship of repository design parameters to easy of recovery is discussed.« less
The Athabasca University eduSource Project: Building an Accessible Learning Object Repository

ERIC Educational Resources Information Center

Cleveland-Innes, Martha; McGreal, Rory; Anderson, Terry; Friesen, Norm; Ally, Mohamed; Tin, Tony; Graham, Rodger; Moisey, Susan; Petrinjak, Anita; Schafer, Steve

2005-01-01

Athabasca University--Canada's Open University (AU) made the commitment to put all of its courses online as part of its Strategic University Plan. In pursuit of this goal, AU participated in the eduSource project, a pan-Canadian effort to build the infrastructure for an interoperable network of learning object repositories. AU acted as a leader in…
moocRP: Enabling Open Learning Analytics with an Open Source Platform for Data Distribution, Analysis, and Visualization

ERIC Educational Resources Information Center

Pardos, Zachary A.; Whyte, Anthony; Kao, Kevin

2016-01-01

In this paper, we address issues of transparency, modularity, and privacy with the introduction of an open source, web-based data repository and analysis tool tailored to the Massive Open Online Course community. The tool integrates data request/authorization and distribution workflow features as well as provides a simple analytics module upload…
Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping.

PubMed

Busby, Ben; Lesko, Matthew; Federer, Lisa

2016-01-01

In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.
ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)

EPA Science Inventory

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...
Managing Digital Archives Using Open Source Software Tools

NASA Astrophysics Data System (ADS)

Barve, S.; Dongare, S.

2007-10-01

This paper describes the use of open source software tools such as MySQL and PHP for creating database-backed websites. Such websites offer many advantages over ones built from static HTML pages. This paper will discuss how OSS tools are used and their benefits, and after the successful implementation of these tools how the library took the initiative in implementing an institutional repository using DSpace open source software.
Closing gaps between open software and public data in a hackathon setting: User-centered software prototyping

PubMed Central

Busby, Ben; Lesko, Matthew; Federer, Lisa

2016-01-01

In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon’s conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team. PMID:27134733
Introduction to geospatial semantics and technology workshop handbook

USGS Publications Warehouse

Varanka, Dalia E.

2012-01-01

The workshop is a tutorial on introductory geospatial semantics with hands-on exercises using standard Web browsers. The workshop is divided into two sections, general semantics on the Web and specific examples of geospatial semantics using data from The National Map of the U.S. Geological Survey and the Open Ontology Repository. The general semantics section includes information and access to publicly available semantic archives. The specific session includes information on geospatial semantics with access to semantically enhanced data for hydrography, transportation, boundaries, and names. The Open Ontology Repository offers open-source ontologies for public use.
Design and Development of an Institutional Repository at the Indian Institute of Technology Kharagpur

ERIC Educational Resources Information Center

Sutradhar, B.

2006-01-01

Purpose: To describe how an institutional repository (IR) was set up, using open source software, at the Indian Institute of Technology (IIT) in Kharagpur. Members of the IIT can publish their research documents in the IR for online access as well as digital preservation. Material in this IR includes instructional materials, records, data sets,…
DataUp: Helping manage and archive data within the researcher's workflow

NASA Astrophysics Data System (ADS)

Strasser, C.

2012-12-01

There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are lacks of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. We have developed an open-source add-in for Excel and an open source web application intended to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. The researcher does not need a prior relationship with a data repository to use DataUp; the newly implemented ONEShare repository, a DataONE member node, is available for any researcher to archive and share their data. By meeting researchers where they already work, in spreadsheets, DataUp becomes part of the researcher's workflow and data management and sharing becomes easier. Future enhancement of DataUp will rely on members of the community adopting and adapting the DataUp tools to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between Microsoft Research Connections, the University of California's California Digital Library, the Gordon and Betty Moore Foundation, and DataONE.
[Random Variable Read Me File

NASA Technical Reports Server (NTRS)

Teubert, Christopher; Sankararaman, Shankar; Cullo, Aiden

2017-01-01

Readme for the Random Variable Toolbox usable manner. is a Web-based Git version control repository hosting service. It is mostly used for computer code. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.[3] GitHub offers both plans for private and free repositories on the same account[4] which are commonly used to host open-source software projects.[5] As of April 2017, GitHub reports having almost 20 million users and 57 million repositories,[6] making it the largest host of source code in the world.[7] GitHub has a mascot called Octocat, a cat with five tentacles and a human-like face
DSpace and customized controlled vocabularies

NASA Astrophysics Data System (ADS)

Skourlas, C.; Tsolakidis, A.; Kakoulidis, P.; Giannakopoulos, G.

2015-02-01

The open source platform of DSpace could be defined as a repository application used to provide access to digital resources. DSpace is installed and used by more than 1000 organizations worldwide. A predefined taxonomy of keyword, called the Controlled Vocabulary, can be used for describing and accessing the information items stored in the repository. In this paper, we describe how the users can create, and customize their own vocabularies. Various heterogeneous items, such as research papers, videos, articles and educational material of the repository, can be indexed in order to provide advanced search functionality using new controlled vocabularies.
Inequalities in Open Source Software Development: Analysis of Contributor's Commits in Apache Software Foundation Projects.

PubMed

Chełkowski, Tadeusz; Gloor, Peter; Jemielniak, Dariusz

2016-01-01

While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that has been gathered in the publicly available open-source software repositories and mailing-list archives offers an opportunity to analyze projects structures and participant involvement. In this article, using on commits data from 263 Apache projects repositories (nearly all), we show that although OSS development is often described as collaborative, but it in fact predominantly relies on radically solitary input and individual, non-collaborative contributions. We also show, in the first published study of this magnitude, that the engagement of contributors is based on a power-law distribution.
Inequalities in Open Source Software Development: Analysis of Contributor’s Commits in Apache Software Foundation Projects

PubMed Central

2016-01-01

While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that has been gathered in the publicly available open-source software repositories and mailing-list archives offers an opportunity to analyze projects structures and participant involvement. In this article, using on commits data from 263 Apache projects repositories (nearly all), we show that although OSS development is often described as collaborative, but it in fact predominantly relies on radically solitary input and individual, non-collaborative contributions. We also show, in the first published study of this magnitude, that the engagement of contributors is based on a power-law distribution. PMID:27096157
OER Use in Intermediate Language Instruction: A Case Study

ERIC Educational Resources Information Center

Godwin-Jones, Robert

2017-01-01

This paper reports on a case study in the experimental use of Open Educational Resources (OERs) in intermediate level language instruction. The resources come from three sources: the instructor, the students, and open content repositories. The objective of this action research project was to provide student-centered learning materials, enhance…
Using FLOSS Project Metadata in the Undergraduate Classroom

NASA Astrophysics Data System (ADS)

Squire, Megan; Duvall, Shannon

This paper describes our efforts to use the large amounts of data available from public repositories of free, libre, and open source software (FLOSS) in our undergraduate classrooms to teach concepts that would have previously been taught using other types of data from other sources.

MetaboLights: An Open-Access Database Repository for Metabolomics Data.

PubMed

Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

2016-03-24

MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. Copyright © 2016 John Wiley & Sons, Inc.
Coupled Multi-physical Simulations for the Assessment of Nuclear Waste Repository Concepts: Modeling, Software Development and Simulation

NASA Astrophysics Data System (ADS)

Massmann, J.; Nagel, T.; Bilke, L.; Böttcher, N.; Heusermann, S.; Fischer, T.; Kumar, V.; Schäfers, A.; Shao, H.; Vogel, P.; Wang, W.; Watanabe, N.; Ziefle, G.; Kolditz, O.

2016-12-01

As part of the German site selection process for a high-level nuclear waste repository, different repository concepts in the geological candidate formations rock salt, clay stone and crystalline rock are being discussed. An open assessment of these concepts using numerical simulations requires physical models capturing the individual particularities of each rock type and associated geotechnical barrier concept to a comparable level of sophistication. In a joint work group of the Helmholtz Centre for Environmental Research (UFZ) and the German Federal Institute for Geosciences and Natural Resources (BGR), scientists of the UFZ are developing and implementing multiphysical process models while BGR scientists apply them to large scale analyses. The advances in simulation methods for waste repositories are incorporated into the open-source code OpenGeoSys. Here, recent application-driven progress in this context is highlighted. A robust implementation of visco-plasticity with temperature-dependent properties into a framework for the thermo-mechanical analysis of rock salt will be shown. The model enables the simulation of heat transport along with its consequences on the elastic response as well as on primary and secondary creep or the occurrence of dilatancy in the repository near field. Transverse isotropy, non-isothermal hydraulic processes and their coupling to mechanical stresses are taken into account for the analysis of repositories in clay stone. These processes are also considered in the near field analyses of engineered barrier systems, including the swelling/shrinkage of the bentonite material. The temperature-dependent saturation evolution around the heat-emitting waste container is described by different multiphase flow formulations. For all mentioned applications, we illustrate the workflow from model development and implementation, over verification and validation, to repository-scale application simulations using methods of high performance computing.
Open-source tools for data mining.

PubMed

Zupan, Blaz; Demsar, Janez

2008-03-01

With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
XNAT Central: Open sourcing imaging research data.

PubMed

Herrick, Rick; Horton, William; Olsen, Timothy; McKay, Michael; Archie, Kevin A; Marcus, Daniel S

2016-01-01

XNAT Central is a publicly accessible medical imaging data repository based on the XNAT open-source imaging informatics platform. It hosts a wide variety of research imaging data sets. The primary motivation for creating XNAT Central was to provide a central repository to host and provide access to a wide variety of neuroimaging data. In this capacity, XNAT Central hosts a number of data sets from research labs and investigative efforts from around the world, including the OASIS Brains imaging studies, the NUSDAST study of schizophrenia, and more. Over time, XNAT Central has expanded to include imaging data from many different fields of research, including oncology, orthopedics, cardiology, and animal studies, but continues to emphasize neuroimaging data. Through the use of XNAT's DICOM metadata extraction capabilities, XNAT Central provides a searchable repository of imaging data that can be referenced by groups, labs, or individuals working in many different areas of research. The future development of XNAT Central will be geared towards greater ease of use as a reference library of heterogeneous neuroimaging data and associated synthetic data. It will also become a tool for making data available supporting published research and academic articles. Copyright © 2015 Elsevier Inc. All rights reserved.
Implementation of an OAIS Repository Using Free, Open Source Software

NASA Astrophysics Data System (ADS)

Flathers, E.; Gessler, P. E.; Seamon, E.

2015-12-01

The Northwest Knowledge Network (NKN) is a regional data repository located at the University of Idaho that focuses on the collection, curation, and distribution of research data. To support our home institution and others in the region, we offer services to researchers at all stages of the data lifecycle—from grant application and data management planning to data distribution and archive. In this role, we recognize the need to work closely with other data management efforts at partner institutions and agencies, as well as with larger aggregation efforts such as our state geospatial data clearinghouses, data.gov, DataONE, and others. In the past, one of our challenges with monolithic, prepackaged data management solutions is that customization can be difficult to implement and maintain, especially as new versions of the software are released that are incompatible with our local codebase. Our solution is to break the monolith up into its constituent parts, which offers us several advantages. First, any customizations that we make are likely to fall into areas that can be accessed through Application Program Interfaces (API) that are likely to remain stable over time, so our code stays compatible. Second, as components become obsolete or insufficient to meet new demands that arise, we can replace the individual components with minimal effect on the rest of the infrastructure, causing less disruption to operations. Other advantages include increased system reliability, staggered rollout of new features, enhanced compatibility with legacy systems, reduced dependence on a single software company as a point of failure, and the separation of development into manageable tasks. In this presentation, we describe our application of the Service Oriented Architecture (SOA) design paradigm to assemble a data repository that conforms to the Open Archival Information System (OAIS) Reference Model primarily using a collection of free and open-source software. We detail the design of the repository, based upon open standards to support interoperability with other institutions' systems and with future versions of our own software components. We also describe the implementation process, including our use of GitHub as a collaboration tool and code repository.
HEPData: a repository for high energy physics data

NASA Astrophysics Data System (ADS)

Maguire, Eamonn; Heinrich, Lukas; Watt, Graeme

2017-10-01

The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics papers. It comprises data points underlying several thousand publications. Over the last two years, the HEPData software has been completely rewritten using modern computing technologies as an overlay on the Invenio v3 digital library framework. The software is open source with the new site available at https://hepdata.net now replacing the previous site at http://hepdata.cedar.ac.uk. In this write-up, we describe the development of the new site and explain some of the advantages it offers over the previous platform.
A strategy to establish Food Safety Model Repositories.

PubMed

Plaza-Rodríguez, C; Thoens, C; Falenski, A; Weiser, A A; Appel, B; Kaesbohrer, A; Filter, M

2015-07-02

Transferring the knowledge of predictive microbiology into real world food manufacturing applications is still a major challenge for the whole food safety modelling community. To facilitate this process, a strategy for creating open, community driven and web-based predictive microbial model repositories is proposed. These collaborative model resources could significantly improve the transfer of knowledge from research into commercial and governmental applications and also increase efficiency, transparency and usability of predictive models. To demonstrate the feasibility, predictive models of Salmonella in beef previously published in the scientific literature were re-implemented using an open source software tool called PMM-Lab. The models were made publicly available in a Food Safety Model Repository within the OpenML for Predictive Modelling in Food community project. Three different approaches were used to create new models in the model repositories: (1) all information relevant for model re-implementation is available in a scientific publication, (2) model parameters can be imported from tabular parameter collections and (3) models have to be generated from experimental data or primary model parameters. All three approaches were demonstrated in the paper. The sample Food Safety Model Repository is available via: http://sourceforge.net/projects/microbialmodelingexchange/files/models and the PMM-Lab software can be downloaded from http://sourceforge.net/projects/pmmlab/. This work also illustrates that a standardized information exchange format for predictive microbial models, as the key component of this strategy, could be established by adoption of resources from the Systems Biology domain. Copyright © 2015. Published by Elsevier B.V.
Trends in the Evolution of the Public Web, 1998-2002; The Fedora Project: An Open-source Digital Object Repository Management System; State of the Dublin Core Metadata Initiative, April 2003; Preservation Metadata; How Many People Search the ERIC Database Each Day?

ERIC Educational Resources Information Center

O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.

2003-01-01

Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
Seven [Data] Habits of Highly Successful Researchers

NASA Astrophysics Data System (ADS)

Kinkade, D.; Shepherd, A.; Saito, M. A.; Wiebe, P. H.; Ake, H.; Biddle, M.; Copley, N. J.; Rauch, S.; Switzer, M. E.; York, A.

2017-12-01

Navigating the landscape of open science and data sharing can be daunting for the long-tail scientist. From satisfying funder requirements, and ensuring proper attribution for their work, to determining the best repository for data management and archive, there are several facets to be considered. Yet, there is no single source of guidance for investigators who may be using multiple research funding models. What role can existing repositories play to help facilitate a more effective data sharing workflow? The Biological and Chemical Oceanographic Data Management Office (BCO-DMO) is a domain-specific repository occupying the niche between funder and investigator. The office works closely with its stakeholders to develop and provide guidance, services, and tools that assist researchers in meeting their data sharing needs. From determining if BCO-DMO is the appropriate repository to manage an investigator's project data, to ensuring that investigator is able to fulfill funder requirements. The goal is to relieve the investigator of the more difficult aspects of data management and data sharing, while simultaneously educating them in better data management practices that will streamline the process of conducting open research in the future. This presentation will provide an overview of the BCO-DMO repository, highlighting some of the services and guidance the office provides to its community.
ROSA P : The National Transportation Library’s Repository and Open Science Access Portal

DOT National Transportation Integrated Search

2018-01-01

The National Transportation Library (NTL) was founded as an all-digital repository of US DOT research reports, technical publications and data products. NTLs primary public offering is ROSA P, the Repository and Open Science Access Portal. An open...
Shared Medical Imaging Repositories.

PubMed

Lebre, Rui; Bastião, Luís; Costa, Carlos

2018-01-01

This article describes the implementation of a solution for the integration of ownership concept and access control over medical imaging resources, making possible the centralization of multiple instances of repositories. The proposed architecture allows the association of permissions to repository resources and delegation of rights to third entities. It includes a programmatic interface for management of proposed services, made available through web services, with the ability to create, read, update and remove all components resulting from the architecture. The resulting work is a role-based access control mechanism that was integrated with Dicoogle Open-Source Project. The solution has several application scenarios like, for instance, collaborative platforms for research and tele-radiology services deployed at Cloud.
Personal Spaces in Public Repositories as a Facilitator for Open Educational Resource Usage

ERIC Educational Resources Information Center

Cohen, Anat; Reisman, Sorel; Sperling, Barbra Bied

2015-01-01

Learning object repositories are a shared, open and public space; however, the possibility and ability of personal expression in an open, global, public space is crucial. The aim of this study is to explore personal spaces in a big learning object repository as a facilitator for adoption of Open Educational Resources (OER) into teaching practices…
DataMed - an open source discovery index for finding biomedical datasets.

PubMed

Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua

2018-01-13

Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Introducing the Brassica Information Portal: Towards integrating genotypic and phenotypic Brassica crop data

PubMed Central

Eckes, Annemarie H.; Gubała, Tomasz; Nowakowski, Piotr; Szymczyszyn, Tomasz; Wells, Rachel; Irwin, Judith A.; Horro, Carlos; Hancock, John M.; King, Graham; Dyer, Sarah C.; Jurkowski, Wiktor

2017-01-01

The Brassica Information Portal (BIP) is a centralised repository for brassica phenotypic data. The site hosts trait data associated with brassica research and breeding experiments conducted on brassica crops, that are used as oilseeds, vegetables, livestock forage and fodder and for biofuels. A key feature is the explicit management of meta-data describing the provenance and relationships between experimental plant materials, as well as trial design and trait descriptors. BIP is an open access and open source project, built on the schema of CropStoreDB, and as such can provide trait data management strategies for any crop data. A new user interface and programmatic submission/retrieval system helps to simplify data access for researchers, breeders and other end-users. BIP opens up the opportunity to apply integrative, cross-project analyses to data generated by the Brassica Research Community. Here, we present a short description of the current status of the repository. PMID:28529710
Open-source, community-driven microfluidics with Metafluidics.

PubMed

Kong, David S; Thorsen, Todd A; Babb, Jonathan; Wick, Scott T; Gam, Jeremy J; Weiss, Ron; Carr, Peter A

2017-06-07

Microfluidic devices have the potential to automate and miniaturize biological experiments, but open-source sharing of device designs has lagged behind sharing of other resources such as software. Synthetic biologists have used microfluidics for DNA assembly, cell-free expression, and cell culture, but a combination of expense, device complexity, and reliance on custom set-ups hampers their widespread adoption. We present Metafluidics, an open-source, community-driven repository that hosts digital design files, assembly specifications, and open-source software to enable users to build, configure, and operate a microfluidic device. We use Metafluidics to share designs and fabrication instructions for both a microfluidic ring-mixer device and a 32-channel tabletop microfluidic controller. This device and controller are applied to build genetic circuits using standard DNA assembly methods including ligation, Gateway, Gibson, and Golden Gate. Metafluidics is intended to enable a broad community of engineers, DIY enthusiasts, and other nontraditional participants with limited fabrication skills to contribute to microfluidic research.
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.

PubMed

Lekschas, Fritz; Gehlenborg, Nils

2018-04-01

The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
Institutional Repositories, Open Access, and Scholarly Communication: A Study of Conflicting Paradigms

ERIC Educational Resources Information Center

Cullen, Rowena; Chawner, Brenda

2011-01-01

The Open Access movement of the past decade, and institutional repositories developed by universities and academic libraries as a part of that movement, have openly challenged the traditional scholarly communication system. This article examines the growth of repositories around the world, and summarizes a growing body of evidence of the response…
Availability and Accessibility in an Open Access Institutional Repository: A Case Study

ERIC Educational Resources Information Center

Lee, Jongwook; Burnett, Gary; Vandegrift, Micah; Baeg, Jung Hoon; Morris, Richard

2015-01-01

Introduction: This study explores the extent to which an institutional repository makes papers available and accessible on the open Web by using 170 journal articles housed in DigiNole Commons, the institutional repository at Florida State University. Method: To analyse the repository's impact on availability and accessibility, we conducted…
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

PubMed Central

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business. PMID:17916232
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation.

PubMed

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.

Automated Report Generation for Research Data Repositories: From i2b2 to PDF.

PubMed

Thiemann, Volker S; Xu, Tingyan; Röhrig, Rainer; Majeed, Raphael W

2017-01-01

We developed an automated toolchain to generate reports of i2b2 data. It is based on free open source software and runs on a Java Application Server. It is sucessfully used in an ED registry project. The solution is highly configurable and portable to other projects based on i2b2 or compatible factual data sources.
17 CFR 45.6 - Legal entity identifiers

Code of Federal Regulations, 2013 CFR

2013-04-01

... applied to swap data repositories by part 49 of this chapter. (4) Open Source. The schema for the legal... Section 45.6 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA... to the jurisdiction of the Commission shall be identified in all recordkeeping and all swap data...
17 CFR 45.6 - Legal entity identifiers

Code of Federal Regulations, 2012 CFR

2012-04-01

... applied to swap data repositories by part 49 of this chapter. (4) Open Source. The schema for the legal... Section 45.6 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA... to the jurisdiction of the Commission shall be identified in all recordkeeping and all swap data...
AN OPEN-SOURCE COMMUNITY WEB SITE TO SUPPORT GROUND-WATER MODEL TESTING

EPA Science Inventory

A community wiki wiki web site has been created as a resource to support ground-water model development and testing. The Groundwater Gourmet wiki is a repository for user supplied analytical and numerical recipes, how-to's, and examples. Members are encouraged to submit analyti...
Resistance to phytophthora and graft compatibility with persian walnut among seedlings of chinese wingnut from different sources

USDA-ARS?s Scientific Manuscript database

Seedlings from seven open-pollinated selections of Chinese wingnut (Pterocarya stenoptera) (WN) representing collections of the USDA-ARS National Clonal Germplasm Repository at Davis CA and the University of California at Davis were evaluated as rootstocks for resistance to Phytophthora cinnamomi an...
Dissemination of metabolomics results: role of MetaboLights and COSMOS.

PubMed

Salek, Reza M; Haug, Kenneth; Steinbeck, Christoph

2013-05-17

With ever-increasing amounts of metabolomics data produced each year, there is an even greater need to disseminate data and knowledge produced in a standard and reproducible way. To assist with this a general purpose, open source metabolomics repository, MetaboLights, was launched in 2012. To promote a community standard, initially culminated as metabolomics standards initiative (MSI), COordination of Standards in MetabOlomicS (COSMOS) was introduced. COSMOS aims to link life science e-infrastructures within the worldwide metabolomics community as well as develop and maintain open source exchange formats for raw and processed data, ensuring better flow of metabolomics information.
Using Linked Open Data and Semantic Integration to Search Across Geoscience Repositories

NASA Astrophysics Data System (ADS)

Mickle, A.; Raymond, L. M.; Shepherd, A.; Arko, R. A.; Carbotte, S. M.; Chandler, C. L.; Cheatham, M.; Fils, D.; Hitzler, P.; Janowicz, K.; Jones, M.; Krisnadhi, A.; Lehnert, K. A.; Narock, T.; Schildhauer, M.; Wiebe, P. H.

2014-12-01

The MBLWHOI Library is a partner in the OceanLink project, an NSF EarthCube Building Block, applying semantic technologies to enable knowledge discovery, sharing and integration. OceanLink is testing ontology design patterns that link together: two data repositories, Rolling Deck to Repository (R2R), Biological and Chemical Oceanography Data Management Office (BCO-DMO); the MBLWHOI Library Institutional Repository (IR) Woods Hole Open Access Server (WHOAS); National Science Foundation (NSF) funded awards; and American Geophysical Union (AGU) conference presentations. The Library is collaborating with scientific users, data managers, DSpace engineers, experts in ontology design patterns, and user interface developers to make WHOAS, a DSpace repository, linked open data enabled. The goal is to allow searching across repositories without any of the information providers having to change how they manage their collections. The tools developed for DSpace will be made available to the community of users. There are 257 registered DSpace repositories in the United Stated and over 1700 worldwide. Outcomes include: Integration of DSpace with OpenRDF Sesame triple store to provide SPARQL endpoint for the storage and query of RDF representation of DSpace resources, Mapping of DSpace resources to OceanLink ontology, and DSpace "data" add on to provide resolvable linked open data representation of DSpace resources.
Microsoft Repository Version 2 and the Open Information Model.

ERIC Educational Resources Information Center

Bernstein, Philip A.; Bergstraesser, Thomas; Carlson, Jason; Pal, Shankar; Sanders, Paul; Shutt, David

1999-01-01

Describes the programming interface and implementation of the repository engine and the Open Information Model for Microsoft Repository, an object-oriented meta-data management facility that ships in Microsoft Visual Studio and Microsoft SQL Server. Discusses Microsoft's component object model, object manipulation, queries, and information…
Gpufit: An open-source toolkit for GPU-accelerated curve fitting.

PubMed

Przybylski, Adrian; Thiel, Björn; Keller-Findeisen, Jan; Stock, Bernd; Bates, Mark

2017-11-16

We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
Generation of open biomedical datasets through ontology-driven transformation and integration processes.

PubMed

Carmen Legaz-García, María Del; Miñarro-Giménez, José Antonio; Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

2016-06-03

Biomedical research usually requires combining large volumes of data from multiple heterogeneous sources, which makes difficult the integrated exploitation of such data. The Semantic Web paradigm offers a natural technological space for data integration and exploitation by generating content readable by machines. Linked Open Data is a Semantic Web initiative that promotes the publication and sharing of data in machine readable semantic formats. We present an approach for the transformation and integration of heterogeneous biomedical data with the objective of generating open biomedical datasets in Semantic Web formats. The transformation of the data is based on the mappings between the entities of the data schema and the ontological infrastructure that provides the meaning to the content. Our approach permits different types of mappings and includes the possibility of defining complex transformation patterns. Once the mappings are defined, they can be automatically applied to datasets to generate logically consistent content and the mappings can be reused in further transformation processes. The results of our research are (1) a common transformation and integration process for heterogeneous biomedical data; (2) the application of Linked Open Data principles to generate interoperable, open, biomedical datasets; (3) a software tool, called SWIT, that implements the approach. In this paper we also describe how we have applied SWIT in different biomedical scenarios and some lessons learned. We have presented an approach that is able to generate open biomedical repositories in Semantic Web formats. SWIT is able to apply the Linked Open Data principles in the generation of the datasets, so allowing for linking their content to external repositories and creating linked open datasets. SWIT datasets may contain data from multiple sources and schemas, thus becoming integrated datasets.
Open Scenario Study: IDA Open Scenario Repository User’s Manual

DTIC Science & Technology

2010-01-01

Thomason, Study Co-Lead Zachary S. Rabold, Sub-Task Lead Ylli Bajraktari Rachel D. Dubin Mary Catherine Flythe Open Scenario Study: IDA Open Scenario... Bajraktari Rachel D. Dubin Mary Catherine Flythe Open Scenario Study: IDA Open Scenario Repository User’s Manual iii Preface This document reports the...vii Appendices A. Identifying Scenario Components...........................................................A-1 B . Acronyms
Optimizing Data Center Services to Foster Stewardship and Use of Geospatial Data by Heterogeneous Populations of Users

NASA Astrophysics Data System (ADS)

Downs, R. R.; Chen, R. S.; de Sherbinin, A. M.

2017-12-01

Growing recognition of the importance of sharing scientific data more widely and openly has refocused attention on the state of data repositories, including both discipline- or topic-oriented data centers and institutional repositories. Data creators often have several alternatives for depositing and disseminating their natural, social, health, or engineering science data. In selecting a repository for their data, data creators and other stakeholders such as their funding agencies may wish to consider the user community or communities served, the type and quality of data products already offered, and the degree of data stewardship and associated services provided. Some data repositories serve general communities, e.g., those in their host institution or region, whereas others tailor their services to particular scientific disciplines or topical areas. Some repositories are selective when acquiring data and conduct extensive curation and reviews to ensure that data products meet quality standards. Many repositories have secured credentials and established a track record for providing trustworthy, high quality data and services. The NASA Socioeconomic Data and Applications Center (SEDAC) serves users interested in human-environment interactions, including researchers, students, and applied users from diverse sectors. SEDAC is selective when choosing data for dissemination, conducting several reviews of data products and services prior to release. SEDAC works with data producers to continually improve the quality of its open data products and services. As a Distributed Active Archive Center (DAAC) of the NASA Earth Observing System Data and Information System, SEDAC is committed to improving the accessibility, interoperability, and usability of its data in conjunction with data available from other DAACs, as well as other relevant data sources. SEDAC is certified as a Regular Member of the International Council for Science World Data System (ICSU-WDS).
Business intelligence tools for radiology: creating a prototype model using open-source tools.

PubMed

Prevedello, Luciano M; Andriole, Katherine P; Hanson, Richard; Kelly, Pauline; Khorasani, Ramin

2010-04-01

Digital radiology departments could benefit from the ability to integrate and visualize data (e.g. information reflecting complex workflow states) from all of their imaging and information management systems in one composite presentation view. Leveraging data warehousing tools developed in the business world may be one way to achieve this capability. In total, the concept of managing the information available in this data repository is known as Business Intelligence or BI. This paper describes the concepts used in Business Intelligence, their importance to modern Radiology, and the steps used in the creation of a prototype model of a data warehouse for BI using open-source tools.
Dissemination of metabolomics results: role of MetaboLights and COSMOS

PubMed Central

2013-01-01

With ever-increasing amounts of metabolomics data produced each year, there is an even greater need to disseminate data and knowledge produced in a standard and reproducible way. To assist with this a general purpose, open source metabolomics repository, MetaboLights, was launched in 2012. To promote a community standard, initially culminated as metabolomics standards initiative (MSI), COordination of Standards in MetabOlomicS (COSMOS) was introduced. COSMOS aims to link life science e-infrastructures within the worldwide metabolomics community as well as develop and maintain open source exchange formats for raw and processed data, ensuring better flow of metabolomics information. PMID:23683662
Open Access Theses in Institutional Repositories: An Exploratory Study of the Perceptions of Doctoral Students

ERIC Educational Resources Information Center

Stanton, Kate Valentine; Liew, Chern Li

2011-01-01

Introduction: We examine doctoral students' awareness of and attitudes to open access forms of publication. Levels of awareness of open access and the concept of institutional repositories, publishing behaviour and perceptions of benefits and risks of open access publishing were explored. Method: Qualitative and quantitative data were collected…
The Situation of Open Access Institutional Repositories in Spain: 2009 Report

ERIC Educational Resources Information Center

Melero, Remedios; Abadal, Ernest; Abad, Francisca; Rodriguez-Gairin, Josep Manel

2009-01-01

Introduction: The DRIVER I project drew up a detailed report of European repositories based on data gathered in a survey in which Spain's participation was very low. This created a highly distorted image of the implementation of repositories in Spain. This study aims to analyse the current state of Spanish open-access institutional repositories…
The Privacy and Security Implications of Open Data in Healthcare.

PubMed

Kobayashi, Shinji; Kane, Thomas B; Paton, Chris

2018-04-22

The International Medical Informatics Association (IMIA) Open Source Working Group (OSWG) initiated a group discussion to discuss current privacy and security issues in the open data movement in the healthcare domain from the perspective of the OSWG membership. Working group members independently reviewed the recent academic and grey literature and sampled a number of current large-scale open data projects to inform the working group discussion. This paper presents an overview of open data repositories and a series of short case reports to highlight relevant issues present in the recent literature concerning the adoption of open approaches to sharing healthcare datasets. Important themes that emerged included data standardisation, the inter-connected nature of the open source and open data movements, and how publishing open data can impact on the ethics, security, and privacy of informatics projects. The open data and open source movements in healthcare share many common philosophies and approaches including developing international collaborations across multiple organisations and domains of expertise. Both movements aim to reduce the costs of advancing scientific research and improving healthcare provision for people around the world by adopting open intellectual property licence agreements and codes of practice. Implications of the increased adoption of open data in healthcare include the need to balance the security and privacy challenges of opening data sources with the potential benefits of open data for improving research and healthcare delivery. Georg Thieme Verlag KG Stuttgart.
Automated population of an i2b2 clinical data warehouse from an openEHR-based data repository.

PubMed

Haarbrandt, Birger; Tute, Erik; Marschollek, Michael

2016-10-01

Detailed Clinical Model (DCM) approaches have recently seen wider adoption. More specifically, openEHR-based application systems are now used in production in several countries, serving diverse fields of application such as health information exchange, clinical registries and electronic medical record systems. However, approaches to efficiently provide openEHR data to researchers for secondary use have not yet been investigated or established. We developed an approach to automatically load openEHR data instances into the open source clinical data warehouse i2b2. We evaluated query capabilities and the performance of this approach in the context of the Hanover Medical School Translational Research Framework (HaMSTR), an openEHR-based data repository. Automated creation of i2b2 ontologies from archetypes and templates and the integration of openEHR data instances from 903 patients of a paediatric intensive care unit has been achieved. In total, it took an average of ∼2527s to create 2.311.624 facts from 141.917 XML documents. Using the imported data, we conducted sample queries to compare the performance with two openEHR systems and to investigate if this representation of data is feasible to support cohort identification and record level data extraction. We found the automated population of an i2b2 clinical data warehouse to be a feasible approach to make openEHR data instances available for secondary use. Such an approach can facilitate timely provision of clinical data to researchers. It complements analytics based on the Archetype Query Language by allowing querying on both, legacy clinical data sources and openEHR data instances at the same time and by providing an easy-to-use query interface. However, due to different levels of expressiveness in the data models, not all semantics could be preserved during the ETL process. Copyright © 2016 Elsevier Inc. All rights reserved.
Challenges of the Open Source Component Marketplace in the Industry

NASA Astrophysics Data System (ADS)

Ayala, Claudia; Hauge, Øyvind; Conradi, Reidar; Franch, Xavier; Li, Jingyue; Velle, Ketil Sandanger

The reuse of Open Source Software components available on the Internet is playing a major role in the development of Component Based Software Systems. Nevertheless, the special nature of the OSS marketplace has taken the “classical” concept of software reuse based on centralized repositories to a completely different arena based on massive reuse over Internet. In this paper we provide an overview of the actual state of the OSS marketplace, and report preliminary findings about how companies interact with this marketplace to reuse OSS components. Such data was gathered from interviews in software companies in Spain and Norway. Based on these results we identify some challenges aimed to improve the industrial reuse of OSS components.
Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice.

PubMed

Li, Feng; Li, Maoyu; Xiao, Zhiqiang; Zhang, Pengfei; Li, Jianling; Chen, Zhuchu

2006-01-11

Many proteomics initiatives require integration of all information with uniformcriteria from collection of samples and data display to publication of experimental results. The integration and exchanging of these data of different formats and structure imposes a great challenge to us. The XML technology presents a promise in handling this task due to its simplicity and flexibility. Nasopharyngeal carcinoma (NPC) is one of the most common cancers in southern China and Southeast Asia, which has marked geographic and racial differences in incidence. Although there are some cancer proteome databases now, there is still no NPC proteome database. The raw NPC proteome experiment data were captured into one XML document with Human Proteome Markup Language (HUP-ML) editor and imported into native XML database Xindice. The 2D/MS repository of NPC proteome was constructed with Apache, PHP and Xindice to provide access to the database via Internet. On our website, two methods, keyword query and click query, were provided at the same time to access the entries of the NPC proteome database. Our 2D/MS repository can be used to share the raw NPC proteomics data that are generated from gel-based proteomics experiments. The database, as well as the PHP source codes for constructing users' own proteome repository, can be accessed at http://www.xyproteomics.org/.

DataUp 2.0: Improving On a Tool For Helping Researchers Archive, Manage, and Share Their Tabular Data

NASA Astrophysics Data System (ADS)

Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.

2013-12-01

There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.
The MIMIC Code Repository: enabling reproducibility in critical care research.

PubMed

Johnson, Alistair Ew; Stone, David J; Celi, Leo A; Pollard, Tom J

2018-01-01

Lack of reproducibility in medical studies is a barrier to the generation of a robust knowledge base to support clinical decision-making. In this paper we outline the Medical Information Mart for Intensive Care (MIMIC) Code Repository, a centralized code base for generating reproducible studies on an openly available critical care dataset. Code is provided to load the data into a relational structure, create extractions of the data, and reproduce entire analysis plans including research studies. Concepts extracted include severity of illness scores, comorbid status, administrative definitions of sepsis, physiologic criteria for sepsis, organ failure scores, treatment administration, and more. Executable documents are used for tutorials and reproduce published studies end-to-end, providing a template for future researchers to replicate. The repository's issue tracker enables community discussion about the data and concepts, allowing users to collaboratively improve the resource. The centralized repository provides a platform for users of the data to interact directly with the data generators, facilitating greater understanding of the data. It also provides a location for the community to collaborate on necessary concepts for research progress and share them with a larger audience. Consistent application of the same code for underlying concepts is a key step in ensuring that research studies on the MIMIC database are comparable and reproducible. By providing open source code alongside the freely accessible MIMIC-III database, we enable end-to-end reproducible analysis of electronic health records. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
Questions of Quality in Repositories of Open Educational Resources: A Literature Review

ERIC Educational Resources Information Center

Atenas, Javiera; Havemann, Leo

2014-01-01

Open educational resources (OER) are teaching and learning materials which are freely available and openly licensed. Repositories of OER (ROER) are platforms that host and facilitate access to these resources. ROER should not just be designed to store this content--in keeping with the aims of the OER movement, they should support educators in…
Repositories of Open Educational Resources: An Assessment of Reuse and Educational Aspects

ERIC Educational Resources Information Center

Santos-Hermosa, Gema; Ferran-Ferrer, Núria; Abadal, Ernest

2017-01-01

This article provides an overview of the current state of repositories of open educational resources (ROER) in higher education at international level. It analyses a series of educational indicators to determine whether ROER can meet the specific needs of the education context, and to clarify understanding of the reuse of open educational…
mdFoam+: Advanced molecular dynamics in OpenFOAM

NASA Astrophysics Data System (ADS)

Longshaw, S. M.; Borg, M. K.; Ramisetti, S. B.; Zhang, J.; Lockerby, D. A.; Emerson, D. R.; Reese, J. M.

2018-03-01

This paper introduces mdFoam+, which is an MPI parallelised molecular dynamics (MD) solver implemented entirely within the OpenFOAM software framework. It is open-source and released under the same GNU General Public License (GPL) as OpenFOAM. The source code is released as a publicly open software repository that includes detailed documentation and tutorial cases. Since mdFoam+ is designed entirely within the OpenFOAM C++ object-oriented framework, it inherits a number of key features. The code is designed for extensibility and flexibility, so it is aimed first and foremost as an MD research tool, in which new models and test cases can be developed and tested rapidly. Implementing mdFoam+ in OpenFOAM also enables easier development of hybrid methods that couple MD with continuum-based solvers. Setting up MD cases follows the standard OpenFOAM format, as mdFoam+ also relies upon the OpenFOAM dictionary-based directory structure. This ensures that useful pre- and post-processing capabilities provided by OpenFOAM remain available even though the fully Lagrangian nature of an MD simulation is not typical of most OpenFOAM applications. Results show that mdFoam+ compares well to another well-known MD code (e.g. LAMMPS) in terms of benchmark problems, although it also has additional functionality that does not exist in other open-source MD codes.
17 CFR 49.22 - Chief compliance officer.

Code of Federal Regulations, 2014 CFR

2014-04-01

... that the registered swap data repository provide fair and open access as set forth in § 49.27 of this...) SWAP DATA REPOSITORIES § 49.22 Chief compliance officer. (a) Definition of Board of Directors. For... data repository, or for those swap data repositories whose organizational structure does not include a...
Identifying Tensions in the Use of Open Licenses in OER Repositories

ERIC Educational Resources Information Center

Amiel, Tel; Soares, Tiago Chagas

2016-01-01

We present an analysis of 50 repositories for educational content conducted through an "audit system" that helped us classify these repositories, their software systems, promoters, and how they communicated their licensing practices. We randomly accessed five resources from each repository to investigate the alignment of licensing…
The Waste Isolation Pilot Plant transuranic waste repository: A sleeping beauty

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eriksson, L.G.

On May 13, 1998, crowning a 24-year United States Department of Energy effort, the US Environmental Protection Agency certified that the deep geological repository for safe disposal of long-lived, transuranic radioactive waste proposed by the DOE at the Waste Isolation Pilot Plant site in New Mexico complied with all applicable environmental radiation protection standards and compliance criteria. Pursuant to the applicable law, the WIPP Land Withdrawal Act of 1992, as amended in 1997, at the decision of the secretary of energy, the WIPP repository could open 30 calendar days after receiving the EPA certification. The secretary of energy announced Maymore » 13, 1998, that he intended to open the WIPP TRUW repository by June 14, 1998. However, at the end of 1998, the opening of the WIPP TRUW repository remains hostage to time-consuming, hazardous-waste-permitting procedures by the state of New Mexico Environment Department and two legal actions. Based on the EPA-verified high safety and the demonstrated risk reduction to both current and future generations offered by the WIPP TRUW repository, it is concluded that the WIPP TRUW repository is a sleeping beauty that will awake, perhaps in stages, and begin its important mission in 1999.« less
The role of the Jotello F. Soga Library in the digital preservation of South African veterinary history.

PubMed

Breytenbach, Amelia; Lourens, Antoinette; Marsh, Susan

2013-04-26

The history of veterinary science in South Africa can only be appreciated, studied, researched and passed on to coming generations if historical sources are readily available. In most countries, material and sources with historical value are often difficult to locate, dispersed over a large area and not part of the conventional book and journal literature. The Faculty of Veterinary Science of the University of Pretoria and its library has access to a large collection of historical sources. The collection consists of photographs, photographic slides, documents, proceedings, posters, audio-visual material, postcards and other memorabilia. Other institutions in the country are also approached if relevant sources are identified in their collections. The University of Pretoria's institutional repository, UPSpace, was launched in 2006. This provided the Jotello F. Soga Library with the opportunity to fill the repository with relevant digitised collections of diverse heritage and learning resources that can contribute to the long-term preservation and accessibility of historical veterinary sources. These collections are available for use not only by historians and researchers in South Africa but also elsewhere in Africa and the rest of the world. Important historical collections such as the Arnold Theiler collection, the Jotello F. Soga collection and collections of the Onderstepoort Journal of Veterinary Research and the Journal of the South African Veterinary Association are highlighted. The benefits of an open access digital repository, the importance of collaboration across the veterinary community and other prerequisites for the sustainability of a digitisation project and the importance of metadata to enhance accessibility are covered.
The open research system: a web-based metadata and data repository for collaborative research

Treesearch

Charles M. Schweik; Alexander Stepanov; J. Morgan Grove

2005-01-01

Beginning in 1999, a web-based metadata and data repository we call the "open research system" (ORS) was designed and built to assist geographically distributed scientific research teams. The purpose of this innovation was to promote the open sharing of data within and across organizational lines and across geographic distances. As the use of the system...
"I've Never Heard of It Before": Awareness of Open Access at a Small Liberal Arts University

ERIC Educational Resources Information Center

Kocken, Gregory J.; Wical, Stephanie H.

2013-01-01

Small colleges and universities, often late adopters of institutional repositories and open access initiatives, face challenges that have not fully been explored in the professional literature. In an effort to gauge the level of awareness of open access and institutional repositories at the University of Wisconsin-Eau Claire (UWEC), the authors of…
Demonstrating the Open Data Repository's Data Publisher: The CheMin Database

NASA Astrophysics Data System (ADS)

Stone, N.; Lafuente, B.; Bristow, T.; Pires, A.; Keller, R. M.; Downs, R. T.; Blake, D.; Dateo, C. E.; Fonda, M.

2018-04-01

The Open Data Repository's Data Publisher aims to provide an easy-to-use software tool that will allow researchers to create and publish database templates and related data. The CheMin Database developed using this framework is shown as an example.
[The subject repositories of strategy of the Open Access initiative].

PubMed

Soares Guimarães, M C; da Silva, C H; Horsth Noronha, I

2012-11-01

The subject repositories are defined as a set of digital objects resulting from the research related to a specific disciplinary field and occupy a still restricted space in the discussion agenda of the Free Access Movement when compared to amplitude reached in the discussion of Institutional Repositories. Although the Subject Repository comes to prominence in the field, especially for the success of initiatives such as the arXiv, PubMed and E-prints, the literature on the subject is recognized as very limited. Despite its roots in the Library and Information Science, and focus on the management of disciplinary collections (subject area literature), there is little information available about the development and management of subject repositories. The following text seeks to make a brief summary on the topic as a way to present the potential to develop subject repositories in order to strengthen the initiative of open access.
Repositories for Research: Southampton's Evolving Role in the Knowledge Cycle

ERIC Educational Resources Information Center

Simpson, Pauline; Hey, Jessie

2006-01-01

Purpose: To provide an overview of how open access (OA) repositories have grown to take a premier place in the e-research knowledge cycle and offer Southampton's route from project to sustainable institutional repository. Design/methodology/approach: The evolution of institutional repositories and OA is outlined raising questions of multiplicity…
The Open Data Repositorys Data Publisher

NASA Technical Reports Server (NTRS)

Stone, N.; Lafuente, B.; Downs, R. T.; Blake, D.; Bristow, T.; Fonda, M.; Pires, A.

2015-01-01

Data management and data publication are becoming increasingly important components of researcher's workflows. The complexity of managing data, publishing data online, and archiving data has not decreased significantly even as computing access and power has greatly increased. The Open Data Repository's Data Publisher software strives to make data archiving, management, and publication a standard part of a researcher's workflow using simple, web-based tools and commodity server hardware. The publication engine allows for uploading, searching, and display of data with graphing capabilities and downloadable files. Access is controlled through a robust permissions system that can control publication at the field level and can be granted to the general public or protected so that only registered users at various permission levels receive access. Data Publisher also allows researchers to subscribe to meta-data standards through a plugin system, embargo data publication at their discretion, and collaborate with other researchers through various levels of data sharing. As the software matures, semantic data standards will be implemented to facilitate machine reading of data and each database will provide a REST application programming interface for programmatic access. Additionally, a citation system will allow snapshots of any data set to be archived and cited for publication while the data itself can remain living and continuously evolve beyond the snapshot date. The software runs on a traditional LAMP (Linux, Apache, MySQL, PHP) server and is available on GitHub (http://github.com/opendatarepository) under a GPLv2 open source license. The goal of the Open Data Repository is to lower the cost and training barrier to entry so that any researcher can easily publish their data and ensure it is archived for posterity.
An algorithm to detect and communicate the differences in computational models describing biological systems.

PubMed

Scharm, Martin; Wolkenhauer, Olaf; Waltemath, Dagmar

2016-02-15

Repositories support the reuse of models and ensure transparency about results in publications linked to those models. With thousands of models available in repositories, such as the BioModels database or the Physiome Model Repository, a framework to track the differences between models and their versions is essential to compare and combine models. Difference detection not only allows users to study the history of models but also helps in the detection of errors and inconsistencies. Existing repositories lack algorithms to track a model's development over time. Focusing on SBML and CellML, we present an algorithm to accurately detect and describe differences between coexisting versions of a model with respect to (i) the models' encoding, (ii) the structure of biological networks and (iii) mathematical expressions. This algorithm is implemented in a comprehensive and open source library called BiVeS. BiVeS helps to identify and characterize changes in computational models and thereby contributes to the documentation of a model's history. Our work facilitates the reuse and extension of existing models and supports collaborative modelling. Finally, it contributes to better reproducibility of modelling results and to the challenge of model provenance. The workflow described in this article is implemented in BiVeS. BiVeS is freely available as source code and binary from sems.uni-rostock.de. The web interface BudHat demonstrates the capabilities of BiVeS at budhat.sems.uni-rostock.de. © The Author 2015. Published by Oxford University Press.
17 CFR 49.27 - Access and fees.

Code of Federal Regulations, 2013 CFR

2013-04-01

... 49.27 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA REPOSITORIES § 49.27 Access and fees. (a) Fair, open and equal access. (1) A registered swap data repository..., swap dealers, major swap participants and any other counterparties, on a fair, open and equal basis...
17 CFR 49.27 - Access and fees.

Code of Federal Regulations, 2014 CFR

2014-04-01

... 49.27 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION (CONTINUED) SWAP DATA REPOSITORIES § 49.27 Access and fees. (a) Fair, open and equal access. (1) A registered swap data repository..., swap dealers, major swap participants and any other counterparties, on a fair, open and equal basis...
17 CFR 49.27 - Access and fees.

Code of Federal Regulations, 2012 CFR

2012-04-01

... 49.27 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA REPOSITORIES § 49.27 Access and fees. (a) Fair, open and equal access. (1) A registered swap data repository..., swap dealers, major swap participants and any other counterparties, on a fair, open and equal basis...
A standard-enabled workflow for synthetic biology.

PubMed

Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach

2017-06-15

A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.

Extensible Probabilistic Repository Technology (XPRT)

DTIC Science & Technology

2004-10-01

projects, such as, Centaurus , Evidence Data Base (EDB), etc., others were fabricated, such as INS and FED, while others contain data from the open...Google Web Report Unlimited SOAP API News BBC News Unlimited WEB RSS 1.0 Centaurus Person Demographics 204,402 people from 240 countries...objects of the domain ontology map to the various simulated data-sources. For example, the PersonDemographics are stored in the Centaurus database, while
Research Students and the Loughborough Institutional Repository

ERIC Educational Resources Information Center

Pickton, Margaret; McKnight, Cliff

2006-01-01

This article investigates the potential role for research students in an institutional repository (IR). Face-to-face interviews with 34 research students at Loughborough University were carried out. Using a mixture of closed and open questions, the interviews explored the students' experiences and opinions of publishing, open access and the…
Pedagogical Framing of OER--The Case of Language Teaching

ERIC Educational Resources Information Center

Bradley, Linda; Vigmo, Sylvi

2016-01-01

This study investigates what characterises teachers' pedagogical design of OER [Open Educational Resources], and potential affordances and constraints in pedagogical design in an open education practice, when contributing to a Swedish repository Lektion.se. The teachers' framing of the OER shared on the repository included the analyses of a…
Open Science CBS Neuroimaging Repository: Sharing ultra-high-field MR images of the brain.

PubMed

Tardif, Christine Lucas; Schäfer, Andreas; Trampel, Robert; Villringer, Arno; Turner, Robert; Bazin, Pierre-Louis

2016-01-01

Magnetic resonance imaging at ultra high field opens the door to quantitative brain imaging at sub-millimeter isotropic resolutions. However, novel image processing tools to analyze these new rich datasets are lacking. In this article, we introduce the Open Science CBS Neuroimaging Repository: a unique repository of high-resolution and quantitative images acquired at 7 T. The motivation for this project is to increase interest for high-resolution and quantitative imaging and stimulate the development of image processing tools developed specifically for high-field data. Our growing repository currently includes datasets from MP2RAGE and multi-echo FLASH sequences from 28 and 20 healthy subjects respectively. These datasets represent the current state-of-the-art in in-vivo relaxometry at 7 T, and are now fully available to the entire neuroimaging community. Copyright © 2015 Elsevier Inc. All rights reserved.
Next-Generation Search Engines for Information Retrieval

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Hook, Leslie A; Palanisamy, Giri

In the recent years, there have been significant advancements in the areas of scientific data management and retrieval techniques, particularly in terms of standards and protocols for archiving data and metadata. Scientific data is rich, and spread across different places. In order to integrate these pieces together, a data archive and associated metadata should be generated. Data should be stored in a format that can be retrievable and more importantly it should be in a format that will continue to be accessible as technology changes, such as XML. While general-purpose search engines (such as Google or Bing) are useful formore » finding many things on the Internet, they are often of limited usefulness for locating Earth Science data relevant (for example) to a specific spatiotemporal extent. By contrast, tools that search repositories of structured metadata can locate relevant datasets with fairly high precision, but the search is limited to that particular repository. Federated searches (such as Z39.50) have been used, but can be slow and the comprehensiveness can be limited by downtime in any search partner. An alternative approach to improve comprehensiveness is for a repository to harvest metadata from other repositories, possibly with limits based on subject matter or access permissions. Searches through harvested metadata can be extremely responsive, and the search tool can be customized with semantic augmentation appropriate to the community of practice being served. One such system, Mercury, a metadata harvesting, data discovery, and access system, built for researchers to search to, share and obtain spatiotemporal data used across a range of climate and ecological sciences. Mercury is open-source toolset, backend built on Java and search capability is supported by the some popular open source search libraries such as SOLR and LUCENE. Mercury harvests the structured metadata and key data from several data providing servers around the world and builds a centralized index. The harvested files are indexed against SOLR search API consistently, so that it can render search capabilities such as simple, fielded, spatial and temporal searches across a span of projects ranging from land, atmosphere, and ocean ecology. Mercury also provides data sharing capabilities using Open Archive Initiatives Protocol for Metadata Handling (OAI-PMH). In this paper we will discuss about the best practices for archiving data and metadata, new searching techniques, efficient ways of data retrieval and information display.« less
Resurrecting Legacy Code Using Ontosoft Knowledge-Sharing and Digital Object Management to Revitalize and Reproduce Software for Groundwater Management Research

NASA Astrophysics Data System (ADS)

Kwon, N.; Gentle, J.; Pierce, S. A.

2015-12-01

Software code developed for research is often used for a relatively short period of time before it is abandoned, lost, or becomes outdated. This unintentional abandonment of code is a valid problem in the 21st century scientific process, hindering widespread reusability and increasing the effort needed to develop research software. Potentially important assets, these legacy codes may be resurrected and documented digitally for long-term reuse, often with modest effort. Furthermore, the revived code may be openly accessible in a public repository for researchers to reuse or improve. For this study, the research team has begun to revive the codebase for Groundwater Decision Support System (GWDSS), originally developed for participatory decision making to aid urban planning and groundwater management, though it may serve multiple use cases beyond those originally envisioned. GWDSS was designed as a java-based wrapper with loosely federated commercial and open source components. If successfully revitalized, GWDSS will be useful for both practical applications as a teaching tool and case study for groundwater management, as well as informing theoretical research. Using the knowledge-sharing approaches documented by the NSF-funded Ontosoft project, digital documentation of GWDSS is underway, from conception to development, deployment, characterization, integration, composition, and dissemination through open source communities and geosciences modeling frameworks. Information assets, documentation, and examples are shared using open platforms for data sharing and assigned digital object identifiers. Two instances of GWDSS version 3.0 are being created: 1) a virtual machine instance for the original case study to serve as a live demonstration of the decision support tool, assuring the original version is usable, and 2) an open version of the codebase, executable installation files, and developer guide available via an open repository, assuring the source for the application is accessible with version control and potential for new branch developments. Finally, metadata about the software has been completed within the OntoSoft portal to provide descriptive curation, make GWDSS searchable, and complete documentation of the scientific software lifecycle.
Online molecular image repository and analysis system: A multicenter collaborative open-source infrastructure for molecular imaging research and application.

PubMed

Rahman, Mahabubur; Watabe, Hiroshi

2018-05-01

Molecular imaging serves as an important tool for researchers and clinicians to visualize and investigate complex biochemical phenomena using specialized instruments; these instruments are either used individually or in combination with targeted imaging agents to obtain images related to specific diseases with high sensitivity, specificity, and signal-to-noise ratios. However, molecular imaging, which is a multidisciplinary research field, faces several challenges, including the integration of imaging informatics with bioinformatics and medical informatics, requirement of reliable and robust image analysis algorithms, effective quality control of imaging facilities, and those related to individualized disease mapping, data sharing, software architecture, and knowledge management. As a cost-effective and open-source approach to address these challenges related to molecular imaging, we develop a flexible, transparent, and secure infrastructure, named MIRA, which stands for Molecular Imaging Repository and Analysis, primarily using the Python programming language, and a MySQL relational database system deployed on a Linux server. MIRA is designed with a centralized image archiving infrastructure and information database so that a multicenter collaborative informatics platform can be built. The capability of dealing with metadata, image file format normalization, and storing and viewing different types of documents and multimedia files make MIRA considerably flexible. With features like logging, auditing, commenting, sharing, and searching, MIRA is useful as an Electronic Laboratory Notebook for effective knowledge management. In addition, the centralized approach for MIRA facilitates on-the-fly access to all its features remotely through any web browser. Furthermore, the open-source approach provides the opportunity for sustainable continued development. MIRA offers an infrastructure that can be used as cross-boundary collaborative MI research platform for the rapid achievement in cancer diagnosis and therapeutics. Copyright © 2018 Elsevier Ltd. All rights reserved.
Repository Profiles for Atmospheric and Climate Sciences: Capabilities and Trends in Data Services

NASA Astrophysics Data System (ADS)

Hou, C. Y.; Thompson, C. A.; Palmer, C. L.

2014-12-01

As digital research data proliferate and expectations for open access escalate, the landscape of data repositories is becoming more complex. For example, DataBib currently identifies 980 data repositories across the disciplines, with 117 categorized under Geosciences. In atmospheric and climate sciences, there are great expectations for the integration and reuse of data for advancing science. To realize this potential, resources are needed that explicate the range of repository options available for locating and depositing open data, their conditions of access and use, and the services and tools they provide. This study profiled 38 open digital repositories in the atmospheric and climate sciences, analyzing each on 55 criteria through content analysis of their websites. The results provide a systematic way to assess and compare capabilities, services, and institutional characteristics and identify trends across repositories. Selected results from the more detailed outcomes to be presented: Most repositories offer guidance on data format(s) for submission and dissemination. 42% offer authorization-free access. More than half use some type of data identification system such as DOIs. Nearly half offer some data processing, with a similar number providing software or tools. 78.9% request that users cite or acknowledge datasets used and the data center. Only 21.1% recommend specific metadata standards, such as ISO 19115 or Dublin Core, with more than half utilizing a customized metadata scheme. Information was rarely provided on repository certification and accreditation and uneven for transfer of rights and data security. Few provided policy information on preservation, migration, reappraisal, disposal, or long-term sustainability. As repository use increases, it will be important for institutions to make their procedures and policies explicit, to build trust with user communities and improve efficiencies in data sharing. Resources such as repository profiles will be essential for scientists to weigh options and understand trends in data services across the evolving network of repositories.
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION ON THE REPOSITORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcus Milling

2001-10-01

The NGDRS has attained 72% of its targeted goal for cores and cuttings transfers, with over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. Additionally, large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale of the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghousemore » is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remained actively involved in assisting the National Research Council with background materials and presentations for their panel convened to study the data preservation issue. A final report of the panel is expected in early 2002. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION OF THE REPOSITORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcus Milling

2003-04-01

The NGDRS has facilitated 85% of cores, cuttings, and other data identified available for transfer to the public sector. Over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. To date, with industry contributions for program operations and data transfers, the NGDRS project has realized a 6.5 to 1 return on investment to Department of Energy funds. Large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale ofmore » the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghouse is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remains actively involved in working to realize the vision of the National Research Council's report of geoscience data preservation. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION OF THE REPOSITORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcus Milling

2002-10-01

The NGDRS has facilitated 85% of cores, cuttings, and other data identified available for transfer to the public sector. Over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. To date, with industry contributions for program operations and data transfers, the NGDRS project has realized a 6.5 to 1 return on investment to Department of Energy funds. Large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale ofmore » the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghouse is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remains actively involved in working to realize the vision of the National Research Council's report of geoscience data preservation. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
An open repositories network development for medical teaching resources.

PubMed

Soula, Gérard; Darmoni, Stefan; Le Beux, Pierre; Renard, Jean-Marie; Dahamna, Badisse; Fieschi, Marius

2010-01-01

The lack of interoperability between repositories of heterogeneous and geographically widespread data is an obstacle to the diffusion, sharing and reutilization of those data. We present the development of an open repositories network taking into account both the syntactic and semantic interoperability of the different repositories and based on international standards in this field. The network is used by the medical community in France for the diffusion and sharing of digital teaching resources. The syntactic interoperability of the repositories is managed using the OAI-PMH protocol for the exchange of metadata describing the resources. Semantic interoperability is based, on one hand, on the LOM standard for the description of resources and on MESH for the indexing of the latter and, on the other hand, on semantic interoperability management designed to optimize compliance with standards and the quality of the metadata.
Implementing the EuroFIR Document and Data Repositories as accessible resources of food composition information.

PubMed

Unwin, Ian; Jansen-van der Vliet, Martine; Westenbrink, Susanne; Presser, Karl; Infanger, Esther; Porubska, Janka; Roe, Mark; Finglas, Paul

2016-02-15

The EuroFIR Document and Data Repositories are being developed as accessible collections of source documents, including grey literature, and the food composition data reported in them. These Repositories will contain source information available to food composition database compilers when selecting their nutritional data. The Document Repository was implemented as searchable bibliographic records in the Europe PubMed Central database, which links to the documents online. The Data Repository will contain original data from source documents in the Document Repository. Testing confirmed the FoodCASE food database management system as a suitable tool for the input, documentation and quality assessment of Data Repository information. Data management requirements for the input and documentation of reported analytical results were established, including record identification and method documentation specifications. Document access and data preparation using the Repositories will provide information resources for compilers, eliminating duplicated work and supporting unambiguous referencing of data contributing to their compiled data. Copyright © 2014 Elsevier Ltd. All rights reserved.
Data Collection, Collaboration, Analysis, and Publication Using the Open Data Repository's (ODR) Data Publisher

NASA Astrophysics Data System (ADS)

Lafuente, B.; Stone, N.; Bristow, T.; Keller, R. M.; Blake, D. F.; Downs, R. T.; Pires, A.; Dateo, C. E.; Fonda, M.

2017-12-01

In development for nearly four years, the Open Data Repository's (ODR) Data Publisher software has become a useful tool for researchers' data needs. Data Publisher facilitates the creation of customized databases with flexible permission sets that allow researchers to share data collaboratively while improving data discovery and maintaining ownership rights. The open source software provides an end-to-end solution from collection to final repository publication. A web-based interface allows researchers to enter data, view data, and conduct analysis using any programming language supported by JupyterHub (http://www.jupyterhub.org). This toolset makes it possible for a researcher to store and manipulate their data in the cloud from any internet capable device. Data can be embargoed in the system until a date selected by the researcher. For instance, open publication can be set to a date that coincides with publication of data analysis in a third party journal. In conjunction with teams at NASA Ames and the University of Arizona, a number of pilot studies are being conducted to guide the software development so that it allows them to publish and share their data. These pilots include (1) the Astrobiology Habitable Environments Database (AHED), a central searchable repository designed to promote and facilitate the integration and sharing of all the data generated by the diverse disciplines in astrobiology; (2) a database containing the raw and derived data products from the CheMin instrument on the MSL rover Curiosity (http://odr.io/CheMin), featuring a versatile graphing system, instructions and analytical tools to process the data, and a capability to download data in different formats; and (3) the Mineral Evolution project, which by correlating the diversity of mineral species with their ages, localities, and other measurable properties aims to understand how the episodes of planetary accretion and differentiation, plate tectonics, and origin of life lead to a selective evolution of mineral species through changes in temperature, pressure, and composition. Ongoing development will complete integration of third party meta-data standards and publishing data to the semantic web. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, MSL.
The Open Spectral Database: an open platform for sharing and searching spectral data.

PubMed

Chalk, Stuart J

2016-01-01

A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.
Temperature-package power correlations for open-mode geologic disposal concepts.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hardin, Ernest.

2013-02-01

Logistical simulation of spent nuclear fuel (SNF) management in the U.S. combines storage, transportation and disposal elements to evaluate schedule, cost and other resources needed for all major operations leading to final geologic disposal. Geologic repository reference options are associated with limits on waste package thermal power output at emplacement, in order to meet limits on peak temperature for certain key engineered and natural barriers. These package power limits are used in logistical simulation software such as CALVIN, as threshold requirements that must be met by means of decay storage or SNF blending in waste packages, before emplacement in amore » repository. Geologic repository reference options include enclosed modes developed for crystalline rock, clay or shale, and salt. In addition, a further need has been addressed for open modes in which SNF can be emplaced in a repository, then ventilated for decades or longer to remove heat, prior to permanent repository closure. For each open mode disposal concept there are specified durations for surface decay storage (prior to emplacement), repository ventilation, and repository closure operations. This study simulates those steps for several timing cases, and for SNF with three fuel-burnup characteristics, to develop package power limits at which waste packages can be emplaced without exceeding specified temperature limits many years later after permanent closure. The results are presented in the form of correlations that span a range of package power and peak postclosure temperature, for each open-mode disposal concept, and for each timing case. Given a particular temperature limit value, the corresponding package power limit for each case can be selected for use in CALVIN and similar tools.« less
KiT: a MATLAB package for kinetochore tracking.

PubMed

Armond, Jonathan W; Vladimirou, Elina; McAinsh, Andrew D; Burroughs, Nigel J

2016-06-15

During mitosis, chromosomes are attached to the mitotic spindle via large protein complexes called kinetochores. The motion of kinetochores throughout mitosis is intricate and automated quantitative tracking of their motion has already revealed many surprising facets of their behaviour. Here, we present 'KiT' (Kinetochore Tracking)-an easy-to-use, open-source software package for tracking kinetochores from live-cell fluorescent movies. KiT supports 2D, 3D and multi-colour movies, quantification of fluorescence, integrated deconvolution, parallel execution and multiple algorithms for particle localization. KiT is free, open-source software implemented in MATLAB and runs on all MATLAB supported platforms. KiT can be downloaded as a package from http://www.mechanochemistry.org/mcainsh/software.php The source repository is available at https://bitbucket.org/jarmond/kit and under continuing development. Supplementary data are available at Bioinformatics online. jonathan.armond@warwick.ac.uk. © The Author 2016. Published by Oxford University Press.
Repositories for Deep, Dark, and Offline Data - Building Grey Literature Repositories and Discoverability

NASA Astrophysics Data System (ADS)

Keane, C. M.; Tahirkheli, S.

2017-12-01

Data repositories, especially in the geosciences, have been focused on the management of large quantities of born-digital data and facilitating its discovery and use. Unfortunately, born-digital data, even with its immense scale today, represents only the most recent data acquisitions, leaving a large proportion of the historical data record of the science "out in the cold." Additionally, the data record in the peer-reviewed literature, whether captured directly in the literature or through the journal data archive, represents only a fraction of the reliable data collected in the geosciences. Federal and state agencies, state surveys, and private companies, collect vast amounts of geoscience information and data that is not only reliable and robust, but often the only data representative of specific spatial and temporal conditions. Likewise, even some academic publications, such as senior theses, are unique sources of data, but generally do not have wide discoverability nor guarantees of longevity. As more of these `grey' sources of information and data are born-digital, they become increasingly at risk for permanent loss, not to mention poor discoverability. Numerous studies have shown that grey literature across all disciplines, including geosciences, disappears at a rate of about 8% per year. AGI has been working to develop systems to both improve the discoverability and the preservation of the geoscience grey literature by coupling several open source platforms from the information science community. We will detail the rationale, the technical and legal frameworks for these systems, and the long-term strategies for improving access, use, and stability of these critical data sources.
Do Open Source LMSs Support Personalization? A Comparative Evaluation

NASA Astrophysics Data System (ADS)

Kerkiri, Tania; Paleologou, Angela-Maria

A number of parameters that support the LMSs capabilities towards content personalization are presented and substantiated. These parameters constitute critical criteria for an exhaustive investigation of the personalization capabilities of the most popular open source LMSs. Results are comparatively shown and commented upon, thus highlighting a course of conduct for the implementation of new personalization methodologies for these LMSs, aligned at their existing infrastructure, to maintain support of the numerous educational institutions entrusting major part of their curricula to them. Meanwhile, new capabilities arise as drawn from a more efficient description of the existing resources -especially when organized into widely available repositories- that lead to qualitatively advanced learner-oriented courses which would ideally meet the challenge of combining personification of demand and personalization of thematic content at once.
Usage Patterns of Open Genomic Data

ERIC Educational Resources Information Center

Xia, Jingfeng; Liu, Ying

2013-01-01

This paper uses Genome Expression Omnibus (GEO), a data repository in biomedical sciences, to examine the usage patterns of open data repositories. It attempts to identify the degree of recognition of data reuse value and understand how e-science has impacted a large-scale scholarship. By analyzing a list of 1,211 publications that cite GEO data…

Past and Present Scenario of Open Access Movement in India

ERIC Educational Resources Information Center

Sawant, Sarika

2013-01-01

This paper gives an overview of open archives developed during the period of 2004-2012 in India including institutional, subject, cross repositories, etc. It also depicts the actions taken by the Indian government in response to the OA developments. The paper highlights that there are one cross institutional repository, three cross institutional…
Harvest: an open platform for developing web-based biomedical data discovery and reporting applications.

PubMed

Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

2014-01-01

Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu.
Harvest: an open platform for developing web-based biomedical data discovery and reporting applications

PubMed Central

Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S

2014-01-01

Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu PMID:24131510
[Self-archiving of biomedical papers in open access repositories].

PubMed

Abad-García, M Francisca; Melero, Remedios; Abadal, Ernest; González-Teruel, Aurora

2010-04-01

Open-access literature is digital, online, free of charge, and free of most copyright and licensing restrictions. Self-archiving or deposit of scholarly outputs in institutional repositories (open-access green route) is increasingly present in the activities of the scientific community. Besides the benefits of open access for visibility and dissemination of science, it is increasingly more often required by funding agencies to deposit papers and any other type of documents in repositories. In the biomedical environment this is even more relevant by the impact scientific literature can have on public health. However, to make self-archiving feasible, authors should be aware of its meaning and the terms in which they are allowed to archive their works. In that sense, there are some tools like Sherpa/RoMEO or DULCINEA (both directories of copyright licences of scientific journals at different levels) to find out what rights are retained by authors when they publish a paper and if they allow to implement self-archiving. PubMed Central and its British and Canadian counterparts are the main thematic repositories for biomedical fields. In our country there is none of similar nature, but most of the universities and CSIC, have already created their own institutional repositories. The increase in visibility of research results and their impact on a greater and earlier citation is one of the most frequently advance of open access, but removal of economic barriers to access to information is also a benefit to break borders between groups.
LightWAVE: Waveform and Annotation Viewing and Editing in a Web Browser.

PubMed

Moody, George B

2013-09-01

This paper describes LightWAVE, recently-developed open-source software for viewing ECGs and other physiologic waveforms and associated annotations (event markers). It supports efficient interactive creation and modification of annotations, capabilities that are essential for building new collections of physiologic signals and time series for research. LightWAVE is constructed of components that interact in simple ways, making it straightforward to enhance or replace any of them. The back end (server) is a common gateway interface (CGI) application written in C for speed and efficiency. It retrieves data from its data repository (PhysioNet's open-access PhysioBank archives by default, or any set of files or web pages structured as in PhysioBank) and delivers them in response to requests generated by the front end. The front end (client) is a web application written in JavaScript. It runs within any modern web browser and does not require installation on the user's computer, tablet, or phone. Finally, LightWAVE's scribe is a tiny CGI application written in Perl, which records the user's edits in annotation files. LightWAVE's data repository, back end, and front end can be located on the same computer or on separate computers. The data repository may be split across multiple computers. For compatibility with the standard browser security model, the front end and the scribe must be loaded from the same domain.
OpenSWPC: an open-source integrated parallel simulation code for modeling seismic wave propagation in 3D heterogeneous viscoelastic media

NASA Astrophysics Data System (ADS)

Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi

2017-07-01

We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
HSE12 implementation in libxc

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moussa, Jonathan E.

2013-05-13

This piece of software is a new feature implemented inside an existing open-source library. Specifically, it is a new implementation of a density functional (HSE, short for Heyd-Scuseria-Ernzerhof) for a repository of density functionals, the libxc library. It fixes some numerical problems with existing implementations, as outlined in a scientific paper recently submitted for publication. Density functionals are components of electronic structure simulations, which model properties of electrons inside molecules and crystals.
Ontological Modeling of Educational Resources: A Proposed Implementation for Greek Schools

ERIC Educational Resources Information Center

Poulakakis, Yannis; Vassilakis, Kostas; Kalogiannakis, Michail; Panagiotakis, Spyros

2017-01-01

In eLearning context searching for suitable educational material is still a challenging issue. During the last two decades, various digital repositories, such as Learning Object Repositories, institutional repositories and latterly Open Educational Resources, have been developed to accommodate collections of learning material that can be used for…
Social network of PESCA (Open Source Platform for eHealth).

PubMed

Sanchez, Carlos L; Romero-Cuevas, Miguel; Lopez, Diego M; Lorca, Julio; Alcazar, Francisco J; Ruiz, Sergio; Mercado, Carmen; Garcia-Fortea, Pedro

2008-01-01

Information and Communication Technologies (ICTs) are revolutionizing how healthcare systems deliver top-quality care to citizens. In this way, Open Source Software (OSS) has demonstrated to be an important strategy to spread ICTs use. Several human and technological barriers in adopting OSS for healthcare have been identified. Human barriers include user acceptance, limited support, technical skillfulness, awareness, resistance to change, etc., while Technological barriers embrace need for open standards, heterogeneous OSS developed without normalization and metrics, lack of initiatives to evaluate existing health OSS and need for quality control and functional validation. The goals of PESCA project are to create a platform of interoperable modules to evaluate, classify and validate good practices in health OSS. Furthermore, a normalization platform will provide interoperable solutions in the fields of healthcare services, health surveillance, health literature, and health education, knowledge and research. Within the platform, the first goal to achieve is the setup of the collaborative work infrastructure. The platform is being organized as a Social Network which works to evaluate five scopes of every existing open source tools for eHealth: Open Source Software, Quality, Pedagogical, Security and privacy and Internationalization/I18N. In the meantime, the knowledge collected from the networking will configure a Good Practice Repository on eHealth promoting the effective use of ICT on behalf of the citizen's health.
A proposed application programming interface for a physical volume repository

NASA Technical Reports Server (NTRS)

Jones, Merritt; Williams, Joel; Wrenn, Richard

1996-01-01

The IEEE Storage System Standards Working Group (SSSWG) has developed the Reference Model for Open Storage Systems Interconnection, Mass Storage System Reference Model Version 5. This document, provides the framework for a series of standards for application and user interfaces to open storage systems. More recently, the SSSWG has been developing Application Programming Interfaces (APIs) for the individual components defined by the model. The API for the Physical Volume Repository is the most fully developed, but work is being done on APIs for the Physical Volume Library and for the Mover also. The SSSWG meets every other month, and meetings are open to all interested parties. The Physical Volume Repository (PVR) is responsible for managing the storage of removable media cartridges and for mounting and dismounting these cartridges onto drives. This document describes a model which defines a Physical Volume Repository, and gives a brief summary of the Application Programming Interface (API) which the IEEE Storage Systems Standards Working Group (SSSWG) is proposing as the standard interface for the PVR.
76 FR 53454 - Privacy Act System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-26

... statutory responsibilities of the OIG; and Acting as a repository and source for information necessary to... in matters relating to the statutory responsibilities of the OIG; and 7. Acting as a repository and.... Acting as a repository and source for information necessary to fulfill the reporting requirements of the...
The NCAR Digital Asset Services Hub (DASH): Implementing Unified Data Discovery and Access

NASA Astrophysics Data System (ADS)

Stott, D.; Worley, S. J.; Hou, C. Y.; Nienhouse, E.

2017-12-01

The National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement an integrated single entry point for uniform digital asset discovery and access across the organization in order to improve the efficiency of access, reduce the costs, and establish the foundation for interoperability with other federated systems. This effort supports new policies included in federal funding mandates, NSF data management requirements, and journal citation recommendations. An inventory during the early planning stage identified diverse asset types across the organization that included publications, datasets, metadata, models, images, and software tools and code. The NCAR Digital Asset Services Hub (DASH) is being developed and phased in this year to improve the quality of users' experiences in finding and using these assets. DASH serves to provide engagement, training, search, and support through the following four nodes (see figure). DASH MetadataDASH provides resources for creating and cataloging metadata to the NCAR Dialect, a subset of ISO 19115. NMDEdit, an editor based on a European open source application, has been configured for manual entry of NCAR metadata. CKAN, an open source data portal platform, harvests these XML records (along with records output directly from databases) from a Web Accessible Folder (WAF) on GitHub for validation. DASH SearchThe NCAR Dialect metadata drives cross-organization search and discovery through CKAN, which provides the display interface of search results. DASH search will establish interoperability by facilitating metadata sharing with other federated systems. DASH ConsultingThe DASH Data Curation & Stewardship Coordinator assists with Data Management (DM) Plan preparation and advises on Digital Object Identifiers. The coordinator arranges training sessions on the DASH metadata tools and DM planning, and provides one-on-one assistance as requested. DASH RepositoryA repository is under development for NCAR datasets currently not in existing lab-managed archives. The DASH repository will be under NCAR governance and meet Trustworthy Repositories Audit & Certification (TRAC) requirements. This poster will highlight the processes, lessons learned, and current status of the DASH effort at NCAR.
Core Certification of Data Repositories: Trustworthiness and Long-Term Stewardship

NASA Astrophysics Data System (ADS)

de Sherbinin, A. M.; Mokrane, M.; Hugo, W.; Sorvari, S.; Harrison, S.

2017-12-01

Scientific integrity and norms dictate that data created and used by scientists should be managed, curated, and archived in trustworthy data repositories thus ensuring that science is verifiable and reproducible while preserving the initial investment in collecting data. Research stakeholders including researchers, science funders, librarians, and publishers must also be able to establish the trustworthiness of data repositories they use to confirm that the data they submit and use remain useful and meaningful in the long term. Data repositories are increasingly recognized as a key element of the global research infrastructure and the importance of establishing their trustworthiness is recognised as a prerequisite for efficient scientific research and data sharing. The Core Trustworthy Data Repository Requirements are a set of universal requirements for certification of data repositories at the core level (see: https://goo.gl/PYsygW). They were developed by the ICSU World Data System (WDS: www.icsu-wds.org) and the Data Seal of Approval (DSA: www.datasealofapproval.org)—the two authoritative organizations responsible for the development and implementation of this standard to be further developed under the CoreTrustSeal branding . CoreTrustSeal certification of data repositories involves a minimally intensive process whereby repositories supply evidence that they are sustainable and trustworthy. Repositories conduct a self-assessment which is then reviewed by community peers. Based on this review CoreTrustSeal certification is granted by the CoreTrustSeal Standards and Certification Board. Certification helps data communities—producers, repositories, and consumers—to improve the quality and transparency of their processes, and to increase awareness of and compliance with established standards. This presentation will introduce the CoreTrustSeal certification requirements for repositories and offer an opportunity to discuss ways to improve the contribution of certified data repositories to sustain open data for open scientific research.
iAnn: an event sharing platform for the life sciences.

PubMed

Jimenez, Rafael C; Albar, Juan P; Bhak, Jong; Blatter, Marie-Claude; Blicher, Thomas; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; van Driel, Marc A; Dunn, Michael J; Fernandes, Pedro L; van Gelder, Celia W G; Hermjakob, Henning; Ioannidis, Vassilios; Judge, David P; Kahlem, Pascal; Korpelainen, Eija; Kraus, Hans-Joachim; Loveland, Jane; Mayer, Christine; McDowall, Jennifer; Moran, Federico; Mulder, Nicola; Nyronen, Tommi; Rother, Kristian; Salazar, Gustavo A; Schneider, Reinhard; Via, Allegra; Villaveces, Jose M; Yu, Ping; Schneider, Maria V; Attwood, Teresa K; Corpas, Manuel

2013-08-01

We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. http://iann.pro/iannviewer manuel.corpas@tgac.ac.uk.
Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information.

PubMed

Khushi, Matloob; Edwards, Georgina; de Marcos, Diego Alonso; Carpenter, Jane E; Graham, J Dinny; Clarke, Christine L

2013-02-12

Virtual microscopy includes digitisation of histology slides and the use of computer technologies for complex investigation of diseases such as cancer. However, automated image analysis, or website publishing of such digital images, is hampered by their large file sizes. We have developed two Java based open source tools: Snapshot Creator and NDPI-Splitter. Snapshot Creator converts a portion of a large digital slide into a desired quality JPEG image. The image is linked to the patient's clinical and treatment information in a customised open source cancer data management software (Caisis) in use at the Australian Breast Cancer Tissue Bank (ABCTB) and then published on the ABCTB website (http://www.abctb.org.au) using Deep Zoom open source technology. Using the ABCTB online search engine, digital images can be searched by defining various criteria such as cancer type, or biomarkers expressed. NDPI-Splitter splits a large image file into smaller sections of TIFF images so that they can be easily analysed by image analysis software such as Metamorph or Matlab. NDPI-Splitter also has the capacity to filter out empty images. Snapshot Creator and NDPI-Splitter are novel open source Java tools. They convert digital slides into files of smaller size for further processing. In conjunction with other open source tools such as Deep Zoom and Caisis, this suite of tools is used for the management and archiving of digital microscopy images, enabling digitised images to be explored and zoomed online. Our online image repository also has the capacity to be used as a teaching resource. These tools also enable large files to be sectioned for image analysis. The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/5330903258483934.
The Open Data Repository's Data Publisher

NASA Astrophysics Data System (ADS)

Stone, N.; Lafuente, B.; Downs, R. T.; Bristow, T.; Blake, D. F.; Fonda, M.; Pires, A.

2015-12-01

Data management and data publication are becoming increasingly important components of research workflows. The complexity of managing data, publishing data online, and archiving data has not decreased significantly even as computing access and power has greatly increased. The Open Data Repository's Data Publisher software (http://www.opendatarepository.org) strives to make data archiving, management, and publication a standard part of a researcher's workflow using simple, web-based tools and commodity server hardware. The publication engine allows for uploading, searching, and display of data with graphing capabilities and downloadable files. Access is controlled through a robust permissions system that can control publication at the field level and can be granted to the general public or protected so that only registered users at various permission levels receive access. Data Publisher also allows researchers to subscribe to meta-data standards through a plugin system, embargo data publication at their discretion, and collaborate with other researchers through various levels of data sharing. As the software matures, semantic data standards will be implemented to facilitate machine reading of data and each database will provide a REST application programming interface for programmatic access. Additionally, a citation system will allow snapshots of any data set to be archived and cited for publication while the data itself can remain living and continuously evolve beyond the snapshot date. The software runs on a traditional LAMP (Linux, Apache, MySQL, PHP) server and is available on GitHub (http://github.com/opendatarepository) under a GPLv2 open source license. The goal of the Open Data Repository is to lower the cost and training barrier to entry so that any researcher can easily publish their data and ensure it is archived for posterity. We gratefully acknowledge the support for this study by the Science-Enabling Research Activity (SERA), and NASA NNX11AP82A, Mars Science Laboratory Investigations and University of Arizona Geosciences.
The Efficacy of Institutional Repositories: Reflections on the Development of a Personalised Collection on UPSpace

ERIC Educational Resources Information Center

Olivier, Elsabe

2007-01-01

There is much speculation that the development of institutional repositories will impact on or even change the traditional scholarly communication process. The purpose of this conversation is to introduce the reader to the use of and response to institutional repositories which were initiated by the Open Access Initiative. The concept of…
Continuous integration for concurrent MOOSE framework and application development on GitHub

DOE PAGES

Slaughter, Andrew E.; Peterson, John W.; Gaston, Derek R.; ...

2015-11-20

For the past several years, Idaho National Laboratory’s MOOSE framework team has employed modern software engineering techniques (continuous integration, joint application/framework source code repos- itories, automated regression testing, etc.) in developing closed-source multiphysics simulation software (Gaston et al., Journal of Open Research Software vol. 2, article e10, 2014). In March 2014, the MOOSE framework was released under an open source license on GitHub, significantly expanding and diversifying the pool of current active and potential future contributors on the project. Despite this recent growth, the same philosophy of concurrent framework and application development continues to guide the project’s development roadmap. Severalmore » specific practices, including techniques for managing multiple repositories, conducting automated regression testing, and implementing a cascading build process are discussed in this short paper. Furthermore, special attention is given to describing the manner in which these practices naturally synergize with the GitHub API and GitHub-specific features such as issue tracking, Pull Requests, and project forks.« less
Continuous integration for concurrent MOOSE framework and application development on GitHub

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slaughter, Andrew E.; Peterson, John W.; Gaston, Derek R.

For the past several years, Idaho National Laboratory’s MOOSE framework team has employed modern software engineering techniques (continuous integration, joint application/framework source code repos- itories, automated regression testing, etc.) in developing closed-source multiphysics simulation software (Gaston et al., Journal of Open Research Software vol. 2, article e10, 2014). In March 2014, the MOOSE framework was released under an open source license on GitHub, significantly expanding and diversifying the pool of current active and potential future contributors on the project. Despite this recent growth, the same philosophy of concurrent framework and application development continues to guide the project’s development roadmap. Severalmore » specific practices, including techniques for managing multiple repositories, conducting automated regression testing, and implementing a cascading build process are discussed in this short paper. Furthermore, special attention is given to describing the manner in which these practices naturally synergize with the GitHub API and GitHub-specific features such as issue tracking, Pull Requests, and project forks.« less
ISA software suite: supporting standards-compliant experimental annotation and enabling curation at the community level

PubMed Central

Rocca-Serra, Philippe; Brandizi, Marco; Maguire, Eamonn; Sklyar, Nataliya; Taylor, Chris; Begley, Kimberly; Field, Dawn; Harris, Stephen; Hide, Winston; Hofmann, Oliver; Neumann, Steffen; Sterk, Peter; Tong, Weida; Sansone, Susanna-Assunta

2010-01-01

Summary: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. Availability and Implementation: Software, documentation, case studies and implementations at http://www.isa-tools.org Contact: isatools@googlegroups.com PMID:20679334

NASA's Big Earth Data Initiative Accomplishments

NASA Technical Reports Server (NTRS)

Klene, Stephan A.; Pauli, Elisheva; Pressley, Natalie N.; Cechini, Matthew F.; McInerney, Mark

2017-01-01

The goal of NASA's effort for BEDI is to improve the usability, discoverability, and accessibility of Earth Observation data in support of societal benefit areas. Accomplishments: In support of BEDI goals, datasets have been entered into Common Metadata Repository(CMR), made available via the Open-source Project for a Network Data Access Protocol (OPeNDAP), have a Digital Object Identifier (DOI) registered for the dataset, and to support fast visualization many layers have been added in to the Global Imagery Browse Services (GIBS).
NASA's Big Earth Data Initiative Accomplishments

NASA Astrophysics Data System (ADS)

Klene, S. A.; Pauli, E.; Pressley, N. N.; Cechini, M. F.; McInerney, M.

2017-12-01

The goal of NASA's effort for BEDI is to improve the usability, discoverability, and accessibility of Earth Observation data in support of societal benefit areas. Accomplishments: In support of BEDI goals, datasets have been entered into Common Metadata Repository(CMR), made available via the Open-source Project for a Network Data Access Protocol (OPeNDAP), have a Digital Object Identifier (DOI) registered for the dataset, and to support fast visualization many layers have been added in to the Global Imagery Browse Service(GIBS)
Case for retrievable high-level nuclear waste disposal

USGS Publications Warehouse

Roseboom, Eugene H.

1994-01-01

Plans for the nation's first high-level nuclear waste repository have called for permanently closing and sealing the repository soon after it is filled. However, the hydrologic environment of the proposed site at Yucca Mountain, Nevada, should allow the repository to be kept open and the waste retrievable indefinitely. This would allow direct monitoring of the repository and maintain the options for future generations to improve upon the disposal methods or use the uranium in the spent fuel as an energy resource.
Crowdsourcing Content to Promote Community and Collection Development in Public Libraries

ERIC Educational Resources Information Center

Carr, Melissa Eleftherion

2013-01-01

With the Poetry Center at San Francisco State University, the author has begun to build an open-access digital repository for poetry chapbooks. The repository is essentially a chapbook exchange: a place for poets to share their current works. Users are invited to share their chapbooks via upload and as such gain access to the chapbook repository.…
An open data repository and a data processing software toolset of an equivalent Nordic grid model matched to historical electricity market data.

PubMed

Vanfretti, Luigi; Olsen, Svein H; Arava, V S Narasimham; Laera, Giuseppe; Bidadfar, Ali; Rabuzin, Tin; Jakobsen, Sigurd H; Lavenius, Jan; Baudette, Maxime; Gómez-López, Francisco J

2017-04-01

This article presents an open data repository, the methodology to generate it and the associated data processing software developed to consolidate an hourly snapshot historical data set for the year 2015 to an equivalent Nordic power grid model (aka Nordic 44), the consolidation was achieved by matching the model׳s physical response w.r.t historical power flow records in the bidding regions of the Nordic grid that are available from the Nordic electricity market agent, Nord Pool. The model is made available in the form of CIM v14, Modelica and PSS/E (Siemens PTI) files. The Nordic 44 model in Modelica and PSS/E were first presented in the paper titled "iTesla Power Systems Library (iPSL): A Modelica library for phasor time-domain simulations" (Vanfretti et al., 2016) [1] for a single snapshot. In the digital repository being made available with the submission of this paper (SmarTSLab_Nordic44 Repository at Github, 2016) [2], a total of 8760 snapshots (for the year 2015) that can be used to initialize and execute dynamic simulations using tools compatible with CIM v14, the Modelica language and the proprietary PSS/E tool are provided. The Python scripts to generate the snapshots (processed data) are also available with all the data in the GitHub repository (SmarTSLab_Nordic44 Repository at Github, 2016) [2]. This Nordic 44 equivalent model was also used in iTesla project (iTesla) [3] to carry out simulations within a dynamic security assessment toolset (iTesla, 2016) [4], and has been further enhanced during the ITEA3 OpenCPS project (iTEA3) [5]. The raw, processed data and output models utilized within the iTesla platform (iTesla, 2016) [4] are also available in the repository. The CIM and Modelica snapshots of the "Nordic 44" model for the year 2015 are available in a Zenodo repository.
Visualizing research collections in the National Transportation Library's digital repository : ROSA P.

DOT National Transportation Integrated Search

2017-01-01

The National Transportation Library's (NTL) Repository and Open Science Portal (ROSA P) : is a digital library for transportation, including U. S. Department of Transportation : sponsored research results and technical publications, other documents a...
The NIH BD2K center for big data in translational genomics

PubMed Central

Paten, Benedict; Diekhans, Mark; Druker, Brian J; Friend, Stephen; Guinney, Justin; Gassner, Nadine; Guttman, Mitchell; James Kent, W; Mantey, Patrick; Margolin, Adam A; Massie, Matt; Novak, Adam M; Nothaft, Frank; Pachter, Lior; Patterson, David; Smuga-Otto, Maciej; Stuart, Joshua M; Van’t Veer, Laura; Haussler, David

2015-01-01

The world’s genomics data will never be stored in a single repository – rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world’s genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM’s performance and utility. PMID:26174866
phylo-node: A molecular phylogenetic toolkit using Node.js.

PubMed

O'Halloran, Damien M

2017-01-01

Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic analysis. To address this problem, I have developed, phylo-node, which was developed using Node.js and provides a stable and scalable toolkit that allows the user to perform diverse molecular and phylogenetic tasks. phylo-node can execute the analysis and process the resulting outputs from a suite of software options that provides tools for read processing and genome alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interoperation with other Node modules and languages using Node inheritance patterns, and a customized piping module to support the production of diverse pipelines. phylo-node is open-source and freely available to all users without sign-up or login requirements. All source code and user guidelines are openly available at the GitHub repository: https://github.com/dohalloran/phylo-node.
Credit where credit is due: indexing and exposing data citations in international data repository networks

NASA Astrophysics Data System (ADS)

Jones, M. B.; Vieglais, D.; Cruse, P.; Chodacki, J.; Budden, A. E.; Fenner, M.; Lowenberg, D.; Abrams, S.

2017-12-01

Research data are fundamental to the success of the academic enterprise, and yet the practice of citing data in academic and applied works is not widespread among researchers. Researchers need credit for their contributions, and yet current citation infrastructure focuses primarily on citations to research literature. Some citation indiexing systems even systematically exclude citations to data from their corpus. The Making Data Count (MDC) project will enable measuring the impact of research data much as is currently being done with publications, the primary vehicle for scholarly credit and accountability. The MDC team (including the California Digital Library, COUNTER, DataCite, and DataONE) are working together to publish a new COUNTER recommendation on data usage statistics; launch a DataCite-hosted MDC service for aggregated DLM based on the open-source Lagotto platform; and build tools for data repository and discovery services to easily integrate with the new MDC service. In providing such data-level metrics (DLM), the MDC project augments existing measures of scholarly success and so offers an important incentive promoting open data principles and quality research data through adoption of research data management best practices.
Engaging Researchers with the World's First Scholarly Arts Repositories: Ten Years after the UK's Kultur Project

ERIC Educational Resources Information Center

Meece, Stephanie; Robinson, Amy; Gramstadt, Marie-Therese

2017-01-01

Open access institutional repositories can be ill-equipped to manage the complexity of research outputs from departments of fine arts, media, drama, music, cultural heritage, and the creative arts in general. The U.K.-based Kultur project was funded to create a flexible multimedia repository model using EPrints software. The project launched the…
[Open availability of articles and raw research data in Spanish pediatrics journals].

PubMed

Aleixandre-Benavent, R; Vidal-Infer, A; Alonso-Arroyo, A; González de Dios, J; Ferrer-Sapena, A; Peset, F

2015-01-01

The open Access to publications and the raw data allows its re-use and enhances the advancement of science. The aim of this paper is to identify these practices in Spanish pediatrics journals. We reviewed the author's instructions in 13 Spanish pediatrics journals, identifying their open access and deposit policy. Eight journals allow open access without restriction, and 5 provide information on the ability to re-use and depositing data in repositories or websites. Most of the journals have open access, but do not promote the deposit of additional material or articles in repositories or websites. Copyright © 2013 Asociación Española de Pediatría. Published by Elsevier Espana. All rights reserved.
ADVANTG Shielding Analysis for Closure Operations in an Open-Mode Repository

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bevill, Aaron M; Radulescu, Georgeta; Scaglione, John M

2013-01-01

en-mode repository concepts could require worker entry into access drifts after placement of fuel casks in order to perform activities related to backfill, plug emplacement, routine maintenance, or performance confirmation. An ideal emplacement-drift shielding configuration would minimize dose to workers while maximizing airflow through the emplacement drifts. This paper presents a preliminary investigation of the feasibility and effectiveness of radiation shielding concepts that could be employed to facilitate worker operations in an open-mode repository. The repository model for this study includes pressurized-water reactor fuel assemblies (60 GWd/MTU burnup, 40 year post-irradiation cooldown) in packages of 32 assemblies. The closest fuelmore » packages are 5 meters from dosimetry voxels in the access drift. The unshielded dose to workers in the access drift is 73.7 rem/hour. Prior work suggests that open-mode repository concepts similar to this one would require 15 m3/s of ventilation airflow. Shielding concepts considered here include partial concrete plugs, labyrinthine shields, and stainless steel photon attenuator grids. Maximum dose to workers in the access drift was estimated for each shielding concept using MCNP5 with variance reduction parameters generated by ADVANTG. Because airflow through the shielding is important for open-mode repositories, a semi-empirical estimate of the head loss due to each shielding configuration was also calculated. Airflow and shielding performance vary widely among the proposed shielding configurations. Although the partial plug configuration had the best airflow performance, it allowed dose rates 1500 greater than the specified target. Labyrinthine shielding concepts yield doses on the order of 1 mrem/hour with configurations that impose 3 to 11 J/kg head loss. Adding 1 cm lead lining to the airflow channels of labyrinthine designs further reduces the worker dose by 65% to 95%. Photon-attenuator concepts may reduce worker dose to as low as 29 mrem/hour with head loss on the order of 1.9 J/kg.« less
Analogues as a check of predicted drift stability at Yucca Mountain, Nevada

USGS Publications Warehouse

Stuckless, J.S.

2006-01-01

Calculations made by the U.S. Department of Energy's Yucca Mountain Project as part of the licensing of a proposed geologic repository in southwestern Nevada for the disposal of high-level radioactive waste, predict that emplacement tunnels will remain open with little collapse long after ground support has disintegrated. This conclusion includes the effects of anticipated seismic events. Natural analogues cannot provide a quantitative test of this conclusion, but they can provide a reasonableness test by examining the naturally occuring and anthropogenic examples of stability of subterranean openings. Available data from a variety of sources, combined with limited observations by the author, show that natural underground openings tend to resist collapse for millions of years and that anthropogenic subterranean openings have remained open from before recorded history through today. This stability is true even in seismically active areas. In fact, the archaeological record is heavily skewed toward preservation of underground structures relative to those found at the surface.
Two new promising cultivars of mango for Florida

USDA-ARS?s Scientific Manuscript database

Mango cultivars are mostly the result of random selections from open pollinated chance seedlings of indigenous or introduced germplasm. The National Germplasm Repository (genebank) at the Subtropical Horticulture Research Station (SHRS) in Miami, Florida is an important mango germplasm repository an...
Opening Transportation Data for Innovation : Getting Our Public Access Bits in a Row.

DOT National Transportation Integrated Search

2017-01-10

The legislative mandate for the National Transportation Library (NTL) includes direction to serve as the central repository for transportation information and a portal to federal transportation data. This mandate means that NTLs Repository and Ope...
A performance goal-based seismic design philosophy for waste repository facilities

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hossain, Q.A.

1994-12-31

A performance goal-based seismic design philosophy, compatible with DOE`s present natural phenomena hazards mitigation and {open_quotes}graded approach{close_quotes} philosophy, has been proposed for high level nuclear waste repository facilities. The rationale, evolution, and the desirable features of this method have been described. Why and how the method should and can be applied to the design of a repository facility are also discussed.
Metadata mapping and reuse in caBIG.

PubMed

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-02-05

This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG framework or other frameworks that use metadata repositories. The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG framework and potentially any framework that uses a metadata repository. This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG. This effort contributes to facilitating the development of interoperable systems within caBIG as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies.
PRay - A graphical user interface for interactive visualization and modification of rayinvr models

NASA Astrophysics Data System (ADS)

Fromm, T.

2016-01-01

PRay is a graphical user interface for interactive displaying and editing of velocity models for seismic refraction. It is optimized for editing rayinvr models but can also be used as a dynamic viewer for ray tracing results from other software. The main features are the graphical editing of nodes and fast adjusting of the display (stations and phases). It can be extended by user-defined shell scripts and links to phase picking software. PRay is open source software written in the scripting language Perl, runs on Unix-like operating systems including Mac OS X and provides a version controlled source code repository for community development (https://sourceforge.net/projects/pray-plot-rayinvr/).
The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

PubMed

Clark, Kenneth; Vendt, Bruce; Smith, Kirk; Freymann, John; Kirby, Justin; Koppel, Paul; Moore, Stephen; Phillips, Stanley; Maffitt, David; Pringle, Michael; Tarbox, Lawrence; Prior, Fred

2013-12-01

The National Institutes of Health have placed significant emphasis on sharing of research data to support secondary research. Investigators have been encouraged to publish their clinical and imaging data as part of fulfilling their grant obligations. Realizing it was not sufficient to merely ask investigators to publish their collection of imaging and clinical data, the National Cancer Institute (NCI) created the open source National Biomedical Image Archive software package as a mechanism for centralized hosting of cancer related imaging. NCI has contracted with Washington University in Saint Louis to create The Cancer Imaging Archive (TCIA)-an open-source, open-access information resource to support research, development, and educational initiatives utilizing advanced medical imaging of cancer. In its first year of operation, TCIA accumulated 23 collections (3.3 million images). Operating and maintaining a high-availability image archive is a complex challenge involving varied archive-specific resources and driven by the needs of both image submitters and image consumers. Quality archives of any type (traditional library, PubMed, refereed journals) require management and customer service. This paper describes the management tasks and user support model for TCIA.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ferrada, J.J.

This report compiles preliminary information that supports the premise that a repository is needed in Latin America and analyzes the nuclear situation (mainly in Argentina and Brazil) in terms of nuclear capabilities, inventories, and regional spent-fuel repositories. The report is based on several sources and summarizes (1) the nuclear capabilities in Latin America and establishes the framework for the need of a permanent repository, (2) the International Atomic Energy Agency (IAEA) approach for a regional spent-fuel repository and describes the support that international institutions are lending to this issue, (3) the current situation in Argentina in order to analyze themore » Argentinean willingness to find a location for a deep geological repository, and (4) the issues involved in selecting a location for the repository and identifies a potential location. This report then draws conclusions based on an analysis of this information. The focus of this report is mainly on spent fuel and does not elaborate on other radiological waste sources.« less

Grid Application Meta-Repository System: Repository Interconnectivity and Cross-domain Application Usage in Distributed Computing Environments

NASA Astrophysics Data System (ADS)

Tudose, Alexandru; Terstyansky, Gabor; Kacsuk, Peter; Winter, Stephen

Grid Application Repositories vary greatly in terms of access interface, security system, implementation technology, communication protocols and repository model. This diversity has become a significant limitation in terms of interoperability and inter-repository access. This paper presents the Grid Application Meta-Repository System (GAMRS) as a solution that offers better options for the management of Grid applications. GAMRS proposes a generic repository architecture, which allows any Grid Application Repository (GAR) to be connected to the system independent of their underlying technology. It also presents applications in a uniform manner and makes applications from all connected repositories visible to web search engines, OGSI/WSRF Grid Services and other OAI (Open Archive Initiative)-compliant repositories. GAMRS can also function as a repository in its own right and can store applications under a new repository model. With the help of this model, applications can be presented as embedded in virtual machines (VM) and therefore they can be run in their native environments and can easily be deployed on virtualized infrastructures allowing interoperability with new generation technologies such as cloud computing, application-on-demand, automatic service/application deployments and automatic VM generation.
OnEarth: An Open Source Solution for Efficiently Serving High-Resolution Mapped Image Products

NASA Astrophysics Data System (ADS)

Thompson, C. K.; Plesea, L.; Hall, J. R.; Roberts, J. T.; Cechini, M. F.; Schmaltz, J. E.; Alarcon, C.; Huang, T.; McGann, J. M.; Chang, G.; Boller, R. A.; Ilavajhala, S.; Murphy, K. J.; Bingham, A. W.

2013-12-01

This presentation introduces OnEarth, a server side software package originally developed at the Jet Propulsion Laboratory (JPL), that facilitates network-based, minimum-latency geolocated image access independent of image size or spatial resolution. The key component in this package is the Meta Raster Format (MRF), a specialized raster file extension to the Geospatial Data Abstraction Library (GDAL) consisting of an internal indexed pyramid of image tiles. Imagery to be served is converted to the MRF format and made accessible online via an expandable set of server modules handling requests in several common protocols, including the Open Geospatial Consortium (OGC) compliant Web Map Tile Service (WMTS) as well as Tiled WMS and Keyhole Markup Language (KML). OnEarth has recently transitioned to open source status and is maintained and actively developed as part of GIBS (Global Imagery Browse Services), a collaborative project between JPL and Goddard Space Flight Center (GSFC). The primary function of GIBS is to enhance and streamline the data discovery process and to support near real-time (NRT) applications via the expeditious ingestion and serving of full-resolution imagery representing science products from across the NASA Earth Science spectrum. Open source software solutions are leveraged where possible in order to utilize existing available technologies, reduce development time, and enlist wider community participation. We will discuss some of the factors and decision points in transitioning OnEarth to a suitable open source paradigm, including repository and licensing agreement decision points, institutional hurdles, and perceived benefits. We will also provide examples illustrating how OnEarth is integrated within GIBS and other applications.
17 CFR 49.2 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-04-01

... data repository. (10) Position. The term “position” means the gross and net notional amounts of open... Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION (CONTINUED) SWAP DATA REPOSITORIES... directly, or indirectly, controls, is controlled by, or is under common control with, the swap data...
nmrML: A Community Supported Open Data Standard for the Description, Storage, and Exchange of NMR Data.

PubMed

Schober, Daniel; Jacob, Daniel; Wilson, Michael; Cruz, Joseph A; Marcu, Ana; Grant, Jason R; Moing, Annick; Deborde, Catherine; de Figueiredo, Luis F; Haug, Kenneth; Rocca-Serra, Philippe; Easton, John; Ebbels, Timothy M D; Hao, Jie; Ludwig, Christian; Günther, Ulrich L; Rosato, Antonio; Klein, Matthias S; Lewis, Ian A; Luchinat, Claudio; Jones, Andrew R; Grauslys, Arturas; Larralde, Martin; Yokochi, Masashi; Kobayashi, Naohiro; Porzel, Andrea; Griffin, Julian L; Viant, Mark R; Wishart, David S; Steinbeck, Christoph; Salek, Reza M; Neumann, Steffen

2018-01-02

NMR is a widely used analytical technique with a growing number of repositories available. As a result, demands for a vendor-agnostic, open data format for long-term archiving of NMR data have emerged with the aim to ease and encourage sharing, comparison, and reuse of NMR data. Here we present nmrML, an open XML-based exchange and storage format for NMR spectral data. The nmrML format is intended to be fully compatible with existing NMR data for chemical, biochemical, and metabolomics experiments. nmrML can capture raw NMR data, spectral data acquisition parameters, and where available spectral metadata, such as chemical structures associated with spectral assignments. The nmrML format is compatible with pure-compound NMR data for reference spectral libraries as well as NMR data from complex biomixtures, i.e., metabolomics experiments. To facilitate format conversions, we provide nmrML converters for Bruker, JEOL and Agilent/Varian vendor formats. In addition, easy-to-use Web-based spectral viewing, processing, and spectral assignment tools that read and write nmrML have been developed. Software libraries and Web services for data validation are available for tool developers and end-users. The nmrML format has already been adopted for capturing and disseminating NMR data for small molecules by several open source data processing tools and metabolomics reference spectral libraries, e.g., serving as storage format for the MetaboLights data repository. The nmrML open access data standard has been endorsed by the Metabolomics Standards Initiative (MSI), and we here encourage user participation and feedback to increase usability and make it a successful standard.
Geospatial Analysis Tool Kit for Regional Climate Datasets (GATOR) : An Open-source Tool to Compute Climate Statistic GIS Layers from Argonne Climate Modeling Results

DTIC Science & Technology

2017-08-01

This large repository of climate model results for North America (Wang and Kotamarthi 2013, 2014, 2015) is stored in Network Common Data Form (NetCDF...Network Common Data Form (NetCDF). UCAR/Unidata Program Center, Boulder, CO. Available at: http://www.unidata.ucar.edu/software/netcdf. Accessed on 6/20...emissions diverge from each other regarding fossil fuel use, technology, and other socioeconomic factors. As a result, the estimated emissions for each of
Open source tools for management and archiving of digital microscopy data to allow integration with patient pathology and treatment information

PubMed Central

2013-01-01

Background Virtual microscopy includes digitisation of histology slides and the use of computer technologies for complex investigation of diseases such as cancer. However, automated image analysis, or website publishing of such digital images, is hampered by their large file sizes. Results We have developed two Java based open source tools: Snapshot Creator and NDPI-Splitter. Snapshot Creator converts a portion of a large digital slide into a desired quality JPEG image. The image is linked to the patient’s clinical and treatment information in a customised open source cancer data management software (Caisis) in use at the Australian Breast Cancer Tissue Bank (ABCTB) and then published on the ABCTB website (http://www.abctb.org.au) using Deep Zoom open source technology. Using the ABCTB online search engine, digital images can be searched by defining various criteria such as cancer type, or biomarkers expressed. NDPI-Splitter splits a large image file into smaller sections of TIFF images so that they can be easily analysed by image analysis software such as Metamorph or Matlab. NDPI-Splitter also has the capacity to filter out empty images. Conclusions Snapshot Creator and NDPI-Splitter are novel open source Java tools. They convert digital slides into files of smaller size for further processing. In conjunction with other open source tools such as Deep Zoom and Caisis, this suite of tools is used for the management and archiving of digital microscopy images, enabling digitised images to be explored and zoomed online. Our online image repository also has the capacity to be used as a teaching resource. These tools also enable large files to be sectioned for image analysis. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/5330903258483934 PMID:23402499
ACToR: Aggregated Computational Toxicology Resource (T) ...

EPA Pesticide Factsheets

The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information on chemicals of interest to the EPA, and in particular to be a central source for the testing data on all chemicals regulated by all EPA programs; (2) To be a source of in vivo training data sets for building in vitro to in vivo computational models; (3) To serve as a central source of chemical structure and identity information for the ToxCastTM and Tox21 programs. There are 4 main databases, all linked through a common set of chemical information and a common structure linking chemicals to assay data: the public ACToR system (available at http://actor.epa.gov), the ToxMiner database holding ToxCast and Tox21 data, along with results form statistical analyses on these data; the Tox21 chemical repository which is managing the ordering and sample tracking process for the larger Tox21 project; and the public version of ToxRefDB. The public ACToR system contains information on ~500K compounds with toxicology, exposure and chemical property information from >400 public sources. The web site is visited by ~1,000 unique users per month and generates ~1,000 page requests per day on average. The databases are built on open source technology, which has allowed us to export them to a number of col
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products

NASA Astrophysics Data System (ADS)

Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.

2014-12-01

The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
BioPortal: An Open-Source Community-Based Ontology Repository

NASA Astrophysics Data System (ADS)

Noy, N.; NCBO Team

2011-12-01

Advances in computing power and new computational techniques have changed the way researchers approach science. In many fields, one of the most fruitful approaches has been to use semantically aware software to break down the barriers among disparate domains, systems, data sources, and technologies. Such software facilitates data aggregation, improves search, and ultimately allows the detection of new associations that were previously not detectable. Achieving these analyses requires software systems that take advantage of the semantics and that can intelligently negotiate domains and knowledge sources, identifying commonality across systems that use different and conflicting vocabularies, while understanding apparent differences that may be concealed by the use of superficially similar terms. An ontology, a semantically rich vocabulary for a domain of interest, is the cornerstone of software for bridging systems, domains, and resources. However, as ontologies become the foundation of all semantic technologies in e-science, we must develop an infrastructure for sharing ontologies, finding and evaluating them, integrating and mapping among them, and using ontologies in applications that help scientists process their data. BioPortal [1] is an open-source on-line community-based ontology repository that has been used as a critical component of semantic infrastructure in several domains, including biomedicine and bio-geochemical data. BioPortal, uses the social approaches in the Web 2.0 style to bring structure and order to the collection of biomedical ontologies. It enables users to provide and discuss a wide array of knowledge components, from submitting the ontologies themselves, to commenting on and discussing classes in the ontologies, to reviewing ontologies in the context of their own ontology-based projects, to creating mappings between overlapping ontologies and discussing and critiquing the mappings. Critically, it provides web-service access to all its content, enabling its integration in semantically enriched applications. [1] Noy, N.F., Shah, N.H., et al., BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res, 2009. 37(Web Server issue): p. W170-3.
Scaling an expert system data mart: more facilities in real-time.

PubMed

McNamee, L A; Launsby, B D; Frisse, M E; Lehmann, R; Ebker, K

1998-01-01

Clinical Data Repositories are being rapidly adopted by large healthcare organizations as a method of centralizing and unifying clinical data currently stored in diverse and isolated information systems. Once stored in a clinical data repository, healthcare organizations seek to use this centralized data to store, analyze, interpret, and influence clinical care, quality and outcomes. A recent trend in the repository field has been the adoption of data marts--specialized subsets of enterprise-wide data taken from a larger repository designed specifically to answer highly focused questions. A data mart exploits the data stored in the repository, but can use unique structures or summary statistics generated specifically for an area of study. Thus, data marts benefit from the existence of a repository, are less general than a repository, but provide more effective and efficient support for an enterprise-wide data analysis task. In previous work, we described the use of batch processing for populating data marts directly from legacy systems. In this paper, we describe an architecture that uses both primary data sources and an evolving enterprise-wide clinical data repository to create real-time data sources for a clinical data mart to support highly specialized clinical expert systems.
Open Content in Open Context

ERIC Educational Resources Information Center

Kansa, Sarah Whitcher; Kansa, Eric C.

2007-01-01

This article presents the challenges and rewards of sharing research content through a discussion of Open Context, a new open access data publication system for field sciences and museum collections. Open Context is the first data repository of its kind, allowing self-publication of research data, community commentary through tagging, and clear…
Motivations of Faculty Self-Archiving in Institutional Repositories

ERIC Educational Resources Information Center

Kim, Jihyun

2011-01-01

Professors contribute to Institutional Repositories (IRs) to make their materials widely accessible in keeping with the benefits of Open Access. However, universities' commitment to IRs depends on building trust with faculty and solving copyright concerns. Digital preservation and copyright management in IRs should be strengthened to increase…
Wide-Open: Accelerating public data release by automating detection of overdue datasets

PubMed Central

Poon, Hoifung; Howe, Bill

2017-01-01

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week. PMID:28594819
Wide-Open: Accelerating public data release by automating detection of overdue datasets.

PubMed

Grechkin, Maxim; Poon, Hoifung; Howe, Bill

2017-06-01

Open data is a vital pillar of open science and a key enabler for reproducibility, data reuse, and novel discoveries. Enforcement of open-data policies, however, largely relies on manual efforts, which invariably lag behind the increasingly automated generation of biological data. To address this problem, we developed a general approach to automatically identify datasets overdue for public release by applying text mining to identify dataset references in published articles and parse query results from repositories to determine if the datasets remain private. We demonstrate the effectiveness of this approach on 2 popular National Center for Biotechnology Information (NCBI) repositories: Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA). Our Wide-Open system identified a large number of overdue datasets, which spurred administrators to respond directly by releasing 400 datasets in one week.
Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform.

PubMed

Moutsatsos, Ioannis K; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J; Jenkins, Jeremy L; Holway, Nicholas; Tallarico, John; Parker, Christian N

2017-03-01

High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an "off-the-shelf," open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community.
Jenkins-CI, an Open-Source Continuous Integration System, as a Scientific Data and Image-Processing Platform

PubMed Central

Moutsatsos, Ioannis K.; Hossain, Imtiaz; Agarinis, Claudia; Harbinski, Fred; Abraham, Yann; Dobler, Luc; Zhang, Xian; Wilson, Christopher J.; Jenkins, Jeremy L.; Holway, Nicholas; Tallarico, John; Parker, Christian N.

2016-01-01

High-throughput screening generates large volumes of heterogeneous data that require a diverse set of computational tools for management, processing, and analysis. Building integrated, scalable, and robust computational workflows for such applications is challenging but highly valuable. Scientific data integration and pipelining facilitate standardized data processing, collaboration, and reuse of best practices. We describe how Jenkins-CI, an “off-the-shelf,” open-source, continuous integration system, is used to build pipelines for processing images and associated data from high-content screening (HCS). Jenkins-CI provides numerous plugins for standard compute tasks, and its design allows the quick integration of external scientific applications. Using Jenkins-CI, we integrated CellProfiler, an open-source image-processing platform, with various HCS utilities and a high-performance Linux cluster. The platform is web-accessible, facilitates access and sharing of high-performance compute resources, and automates previously cumbersome data and image-processing tasks. Imaging pipelines developed using the desktop CellProfiler client can be managed and shared through a centralized Jenkins-CI repository. Pipelines and managed data are annotated to facilitate collaboration and reuse. Limitations with Jenkins-CI (primarily around the user interface) were addressed through the selection of helper plugins from the Jenkins-CI community. PMID:27899692
Articles of Data Confederation: DataONE, the KNB, and a multitude of Metacats- scaling interoperable data discovery and preservation from the lab to the Internet

NASA Astrophysics Data System (ADS)

Schildhauer, M.; Jones, M. B.; Jones, C. S.; Tao, J.

2017-12-01

The opportunities for synthesis science to advance understanding of the environment have never been greater. Challenges remain, however, with regards to preserving data in discoverable and re-usable formats, to inform new integrative analyses, and support reproducible science. In this talk I will describe one promising solution for data preservation, discovery, and re-use- the Knowledge Network for Biocomplexity, or KNB. The KNB (http://knb.ecoinformatics.org) has been providing a reliable data repository for ecological and environmental researchers for over 15 years. The KNB is a distributed, open-source, web-enabled data repository based upon a formal metadata standard, EML, that is endorsed by several major ecological institutions including the LTER Network and NCEAS. A KNB server, also called a "Metacat", can be setup on very modest hardware, typically within a few hours, requres no expensive or proprietary software, and only moderate systems administration expertise. A tiered architecture allows KNB servers (or "Metacats") to communicate with other KNB servers, to afford greater operational reliability, higher performance, and reductions in potental data loss. The KNB is a strong member of the DataONE "Data Observation Network for Earth" (http://dataone.org) system, that confederates over 35 significant earth science data repositories (and still growing) from around the world through an open and powerful API. DataONE provides for integrated search over member repository holdings that incorporate features based on W3C-compliant semantics through annotations with OWL/RDF vocabularies such as PROV and the Environment Ontology, ENVO. The KNB and DataONE frameworks have given rise to an Open Science software development community that is actively building tools based on software that scientists already use, such as MATLAB and R. These tools can be used to both contribute data to, and operate upon data within the KNB and DataONE systems. An active User Community within DataONE assists with prioritizing future features of the framework, and provides for peer-to-peer assistance through chat-rooms and email lists. The challenge of achieving long-term sustainable funding for both the KNB and DataONE are still being addressed, and may stimulate discussion towards the end of my talk, time permitting.
CDinFusion – Submission-Ready, On-Line Integration of Sequence and Contextual Data

PubMed Central

Hankeln, Wolfgang; Wendel, Norma Johanna; Gerken, Jan; Waldmann, Jost; Buttigieg, Pier Luigi; Kostadinov, Ivaylo; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver

2011-01-01

State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion. PMID:21935468
The C6H6 NMR repository: An integral solution to control the flow of your data from the magnet to the public.

PubMed

Patiny, Luc; Zasso, Michaël; Kostro, Daniel; Bernal, Andrés; Castillo, Andrés M; Bolaños, Alejandro; Asencio, Miguel A; Pellet, Norman; Todd, Matthew; Schloerer, Nils; Kuhn, Stefan; Holmes, Elaine; Javor, Sacha; Wist, Julien

2017-10-05

NMR is a mature technique that is well established and adopted in a wide range of research facilities from laboratories to hospitals. This accounts for large amounts of valuable experimental data that may be readily exported into a standard and open format. Yet the publication of these data faces an important issue: Raw data are not made available; instead, the information is slimed down into a string of characters (the list of peaks). Although historical limitations of technology explain this practice, it is not acceptable in the era of Internet. The idea of modernizing the strategy for sharing NMR data is not new, and some repositories exist, but sharing raw data is still not an established practice. Here, we present a powerful toolbox built on recent technologies that runs inside the browser and provides a means to store, share, analyse, and interact with original NMR data. Stored spectra can be streamlined into the publication pipeline, to improve the revision process for instance. The set of tools is still basic but is intended to be extended. The project is open source under the Massachusetts Institute of Technology (MIT) licence. Copyright © 2017 John Wiley & Sons, Ltd.
Preservation Health Check: Monitoring Threats to Digital Repository Content

ERIC Educational Resources Information Center

Kool, Wouter; van der Werf, Titia; Lavoie, Brian

2014-01-01

The Preservation Health Check (PHC) project, undertaken as a joint effort by Open Planets Foundation (OPF) and OCLC Research, aims to evaluate the usefulness of the preservation metadata created and maintained by operational repositories for assessing basic preservation properties. The PHC project seeks to develop an implementable logic to support…

A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs

NASA Astrophysics Data System (ADS)

Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.

2013-12-01

The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
Pyteomics--a Python framework for exploratory data analysis and rapid software prototyping in proteomics.

PubMed

Goloborodko, Anton A; Levitsky, Lev I; Ivanov, Mark V; Gorshkov, Mikhail V

2013-02-01

Pyteomics is a cross-platform, open-source Python library providing a rich set of tools for MS-based proteomics. It provides modules for reading LC-MS/MS data, search engine output, protein sequence databases, theoretical prediction of retention times, electrochemical properties of polypeptides, mass and m/z calculations, and sequence parsing. Pyteomics is available under Apache license; release versions are available at the Python Package Index http://pypi.python.org/pyteomics, the source code repository at http://hg.theorchromo.ru/pyteomics, documentation at http://packages.python.org/pyteomics. Pyteomics.biolccc documentation is available at http://packages.python.org/pyteomics.biolccc/. Questions on installation and usage can be addressed to pyteomics mailing list: pyteomics@googlegroups.com.
76 FR 81950 - Privacy Act; System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2011-12-29

... ``Consolidated Data Repository'' (09-90-1000). This system of records is being amended to include records... Repository'' (SORN 09-90-1000). OIG is adding record sources to the system. This system fulfills our..., and investigations of the Medicare and Medicaid programs. SYSTEM NAME: Consolidated Data Repository...
10 CFR 60.22 - Filing and distribution of application.

Code of Federal Regulations, 2010 CFR

2010-01-01

... GEOLOGIC REPOSITORIES Licenses License Applications § 60.22 Filing and distribution of application. (a) An application for a construction authorization for a high-level radioactive waste repository at a geologic repository operations area, and an application for a license to receive and possess source, special nuclear...
Metadata mapping and reuse in caBIG™

PubMed Central

Kunz, Isaac; Lin, Ming-Chin; Frey, Lewis

2009-01-01

Background This paper proposes that interoperability across biomedical databases can be improved by utilizing a repository of Common Data Elements (CDEs), UML model class-attributes and simple lexical algorithms to facilitate the building domain models. This is examined in the context of an existing system, the National Cancer Institute (NCI)'s cancer Biomedical Informatics Grid (caBIG™). The goal is to demonstrate the deployment of open source tools that can be used to effectively map models and enable the reuse of existing information objects and CDEs in the development of new models for translational research applications. This effort is intended to help developers reuse appropriate CDEs to enable interoperability of their systems when developing within the caBIG™ framework or other frameworks that use metadata repositories. Results The Dice (di-grams) and Dynamic algorithms are compared and both algorithms have similar performance matching UML model class-attributes to CDE class object-property pairs. With algorithms used, the baselines for automatically finding the matches are reasonable for the data models examined. It suggests that automatic mapping of UML models and CDEs is feasible within the caBIG™ framework and potentially any framework that uses a metadata repository. Conclusion This work opens up the possibility of using mapping algorithms to reduce cost and time required to map local data models to a reference data model such as those used within caBIG™. This effort contributes to facilitating the development of interoperable systems within caBIG™ as well as other metadata frameworks. Such efforts are critical to address the need to develop systems to handle enormous amounts of diverse data that can be leveraged from new biomedical methodologies. PMID:19208192
New Author Choice for Open Access

NASA Astrophysics Data System (ADS)

2007-04-01

AGU journals now offer authors the opportunity to make their articles open for others to read for free. Authors choosing this option pay a fee based on article length and number of figures; these charges are designed to offset the potential loss of subscription income. This new option, called Author Choice, provides •Unlimited access to the article for all readers from the moment of publication. • Permission to deposit the PDF version in institutional repositories so long as the repository accepts AGU copyright permissions. • Continued copyright protection to prevent unauthorized uses of the author's work.
NOAA's Data Catalog and the Federal Open Data Policy

NASA Astrophysics Data System (ADS)

Wengren, M. J.; de la Beaujardiere, J.

2014-12-01

The 2013 Open Data Policy Presidential Directive requires Federal agencies to create and maintain a 'public data listing' that includes all agency data that is currently or will be made publicly-available in the future. The directive requires the use of machine-readable and open formats that make use of 'common core' and extensible metadata formats according to the best practices published in an online repository called 'Project Open Data', to use open licenses where possible, and to adhere to existing metadata and other technology standards to promote interoperability. In order to meet the requirements of the Open Data Policy, the National Oceanic and Atmospheric Administration (NOAA) has implemented an online data catalog that combines metadata from all subsidiary NOAA metadata catalogs into a single master inventory. The NOAA Data Catalog is available to the public for search and discovery, providing access to the NOAA master data inventory through multiple means, including web-based text search, OGC CS-W endpoint, as well as a native Application Programming Interface (API) for programmatic query. It generates on a daily basis the Project Open Data JavaScript Object Notation (JSON) file required for compliance with the Presidential directive. The Data Catalog is based on the open source Comprehensive Knowledge Archive Network (CKAN) software and runs on the Amazon Federal GeoCloud. This presentation will cover topics including mappings of existing metadata in standard formats (FGDC-CSDGM and ISO 19115 XML ) to the Project Open Data JSON metadata schema, representation of metadata elements within the catalog, and compatible metadata sources used to feed the catalog to include Web Accessible Folder (WAF), Catalog Services for the Web (CS-W), and Esri ArcGIS.com. It will also discuss related open source technologies that can be used together to build a spatial data infrastructure compliant with the Open Data Policy.
Automating RPM Creation from a Source Code Repository

DTIC Science & Technology

2012-02-01

apps/usr --with- libpq=/apps/ postgres make rm -rf $RPM_BUILD_ROOT umask 0077 mkdir -p $RPM_BUILD_ROOT/usr/local/bin mkdir -p $RPM_BUILD_ROOT...from a source code repository. %pre %prep %setup %build ./autogen.sh ; ./configure --with-db=/apps/db --with-libpq=/apps/ postgres make
Semantic Web repositories for genomics data using the eXframe platform.

PubMed

Merrill, Emily; Corlosquet, Stéphane; Ciccarese, Paolo; Clark, Tim; Das, Sudeshna

2014-01-01

With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge.
Eliciting Disease Data from Wikipedia Articles.

PubMed

Fairchild, Geoffrey; Del Valle, Sara Y; De Silva, Lalindra; Segre, Alberto M

2015-05-01

Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system.
Eliciting Disease Data from Wikipedia Articles

PubMed Central

Fairchild, Geoffrey; Del Valle, Sara Y.; De Silva, Lalindra; Segre, Alberto M.

2017-01-01

Traditional disease surveillance systems suffer from several disadvantages, including reporting lags and antiquated technology, that have caused a movement towards internet-based disease surveillance systems. Internet systems are particularly attractive for disease outbreaks because they can provide data in near real-time and can be verified by individuals around the globe. However, most existing systems have focused on disease monitoring and do not provide a data repository for policy makers or researchers. In order to fill this gap, we analyzed Wikipedia article content. We demonstrate how a named-entity recognizer can be trained to tag case counts, death counts, and hospitalization counts in the article narrative that achieves an F1 score of 0.753. We also show, using the 2014 West African Ebola virus disease epidemic article as a case study, that there are detailed time series data that are consistently updated that closely align with ground truth data. We argue that Wikipedia can be used to create the first community-driven open-source emerging disease detection, monitoring, and repository system. PMID:28721308
10 CFR 60.41 - Standards for issuance of a license.

Code of Federal Regulations, 2010 CFR

2010-01-01

... REPOSITORIES Licenses License Issuance and Amendment § 60.41 Standards for issuance of a license. A license to receive and possess source, special nuclear, or byproduct material at a geologic repository operations area may be issued by the Commission upon finding that: (a) Construction of the geologic repository...
10 CFR 60.44 - Changes, tests, and experiments.

Code of Federal Regulations, 2010 CFR

2010-01-01

... REPOSITORIES Licenses License Issuance and Amendment § 60.44 Changes, tests, and experiments. (a)(1) Following authorization to receive and possess source, special nuclear, or byproduct material at a geologic repository operations area, the DOE may (i) make changes in the geologic repository operations area as described in the...
Case Study: Applying OpenEHR Archetypes to a Clinical Data Repository in a Chinese Hospital.

PubMed

Min, Lingtong; Wang, Li; Lu, Xudong; Duan, Huilong

2015-01-01

openEHR is a flexible and scalable modeling methodology for clinical information and has been widely adopted in Europe and Australia. Due to the reasons of differences in clinical process and management, there are few research projects involving openEHR in China. To investigate the feasibility of openEHR methodology for clinical information modelling in China, this paper carries out a case study to apply openEHR archetypes to Clinical Data Repository (CDR) in a Chinese hospital. The results show that a set of 26 archetypes are found to cover all the concepts used in the CDR. Of all these, 9 (34.6%) are reused without change, 10 are modified and/or extended, and 7 are newly defined. The reasons for modification, extension and newly definition have been discussed, including granularity of archetype, metadata-level versus data-level modelling, and the representation of relationships between archetypes.
[The Open Access Initiative (OAI) in the scientific literature].

PubMed

Sánchez-Martín, Francisco M; Millán Rodríguez, Félix; Villavicencio Mavrich, Humberto

2009-01-01

According to the declaration of the Budapest Open Access Initiative (OAI) is defined as a editorial model in which access to scientific journal literature and his use are free. Free flow of information allowed by Internet has been the basis of this initiative. The Bethesda and the Berlin declarations, supported by some international agencies, proposes to require researchers to deposit copies of all articles published in a self-archive or an Open Access repository, and encourage researchers to publish their research papers in journals Open Access. This paper reviews the keys of the OAI, with their strengths and controversial aspects; and it discusses the position of databases, search engines and repositories of biomedical information, as well as the attitude of the scientists, publishers and journals. So far the journal Actas Urológicas Españolas (Act Urol Esp) offer their contents on Open Access as On Line in Spanish and English.
OpenID Connect as a security service in cloud-based medical imaging systems.

PubMed

Ma, Weina; Sartipi, Kamran; Sharghigoorabi, Hassan; Koff, David; Bak, Peter

2016-04-01

The evolution of cloud computing is driving the next generation of medical imaging systems. However, privacy and security concerns have been consistently regarded as the major obstacles for adoption of cloud computing by healthcare domains. OpenID Connect, combining OpenID and OAuth together, is an emerging representational state transfer-based federated identity solution. It is one of the most adopted open standards to potentially become the de facto standard for securing cloud computing and mobile applications, which is also regarded as "Kerberos of cloud." We introduce OpenID Connect as an authentication and authorization service in cloud-based diagnostic imaging (DI) systems, and propose enhancements that allow for incorporating this technology within distributed enterprise environments. The objective of this study is to offer solutions for secure sharing of medical images among diagnostic imaging repository (DI-r) and heterogeneous picture archiving and communication systems (PACS) as well as Web-based and mobile clients in the cloud ecosystem. The main objective is to use OpenID Connect open-source single sign-on and authorization service and in a user-centric manner, while deploying DI-r and PACS to private or community clouds should provide equivalent security levels to traditional computing model.
Reducing Friction: An Update on the NCIP Open Development Initiative - NCI BioMedical Informatics Blog

Cancer.gov

NCIP has migrated 132 repositories from the NCI subversion repository to our public NCIP GitHub channel with the goal of facilitating third party contributions to the existing code base. Within the GitHub environment, we are advocating use of the GitHub “fork and pull” model.
Usability Evaluation of a Research Repository and Collaboration Web Site

ERIC Educational Resources Information Center

Zhang, Tao; Maron, Deborah J.; Charles, Christopher C.

2013-01-01

This article reports results from an empirical usability evaluation of Human-Animal Bond Research Initiative Central as part of the effort to develop an open access research repository and collaboration platform for human-animal bond researchers. By repurposing and altering key features of the original HUBzero system, Human-Animal Bond Research…
Citing and Reading Behaviors of High-Energy Physics or How a Community Stopped Worrying about Journals and Learned to Love Repositories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gentil-Beccot, Anne; Mele, Salvatore; /CERN

Contemporary scholarly discourse follows many alternative routes in addition to the three-century old tradition of publication in peer-reviewed journals. The field of High-Energy Physics (HEP) has explored alternative communication strategies for decades, initially via the mass mailing of paper copies of preliminary manuscripts, then via the inception of the first online repositories and digital libraries. This field is uniquely placed to answer recurrent questions raised by the current trends in scholarly communication: is there an advantage for scientists to make their work available through repositories, often in preliminary form? Is there an advantage to publishing in Open Access journals? Domore » scientists still read journals or do they use digital repositories? The analysis of citation data demonstrates that free and immediate online dissemination of preprints creates an immense citation advantage in HEP, whereas publication in Open Access journals presents no discernible advantage. In addition, the analysis of clickstreams in the leading digital library of the field shows that HEP scientists seldom read journals, preferring preprints instead.« less
MetExploreViz: web component for interactive metabolic network visualization.

PubMed

Chazalviel, Maxime; Frainay, Clément; Poupin, Nathalie; Vinson, Florence; Merlet, Benjamin; Gloaguen, Yoann; Cottret, Ludovic; Jourdan, Fabien

2017-09-15

MetExploreViz is an open source web component that can be easily embedded in any web site. It provides features dedicated to the visualization of metabolic networks and pathways and thus offers a flexible solution to analyze omics data in a biochemical context. Documentation and link to GIT code repository (GPL 3.0 license)are available at this URL: http://metexplore.toulouse.inra.fr/metexploreViz/doc /. Tutorial is available at this URL. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Canary: An NLP Platform for Clinicians and Researchers.

PubMed

Malmasi, Shervin; Sandor, Nicolae L; Hosomura, Naoshi; Goldberg, Matt; Skentzos, Stephen; Turchin, Alexander

2017-05-03

Information Extraction methods can help discover critical knowledge buried in the vast repositories of unstructured clinical data. However, these methods are underutilized in clinical research, potentially due to the absence of free software geared towards clinicians with little technical expertise. The skills required for developing/using such software constitute a major barrier for medical researchers wishing to employ these methods. To address this, we have developed Canary, a free and open-source solution designed for users without natural language processing (NLP) or software engineering experience. It was designed to be fast and work out of the box via a user-friendly graphical interface.
Long-term Science Data Curation Using a Digital Object Model and Open-Source Frameworks

NASA Astrophysics Data System (ADS)

Pan, J.; Lenhardt, W.; Wilson, B. E.; Palanisamy, G.; Cook, R. B.

2010-12-01

Scientific digital content, including Earth Science observations and model output, has become more heterogeneous in format and more distributed across the Internet. In addition, data and metadata are becoming necessarily linked internally and externally on the Web. As a result, such content has become more difficult for providers to manage and preserve and for users to locate, understand, and consume. Specifically, it is increasingly harder to deliver relevant metadata and data processing lineage information along with the actual content consistently. Readme files, data quality information, production provenance, and other descriptive metadata are often separated in the storage level as well as in the data search and retrieval interfaces available to a user. Critical archival metadata, such as auditing trails and integrity checks, are often even more difficult for users to access, if they exist at all. We investigate the use of several open-source software frameworks to address these challenges. We use Fedora Commons Framework and its digital object abstraction as the repository, Drupal CMS as the user-interface, and the Islandora module as the connector from Drupal to Fedora Repository. With the digital object model, metadata of data description and data provenance can be associated with data content in a formal manner, so are external references and other arbitrary auxiliary information. Changes are formally audited on an object, and digital contents are versioned and have checksums automatically computed. Further, relationships among objects are formally expressed with RDF triples. Data replication, recovery, metadata export are supported with standard protocols, such as OAI-PMH. We provide a tentative comparative analysis of the chosen software stack with the Open Archival Information System (OAIS) reference model, along with our initial results with the existing terrestrial ecology data collections at NASA’s ORNL Distributed Active Archive Center for Biogeochemical Dynamics (ORNL DAAC).
The Role of Semantics in Open-World, Integrative, Collaborative Science Data Platforms

NASA Astrophysics Data System (ADS)

Fox, Peter; Chen, Yanning; Wang, Han; West, Patrick; Erickson, John; Ma, Marshall

2014-05-01

As collaborative science spreads into more and more Earth and space science fields, both participants and funders are expressing stronger needs for highly functional data and information capabilities. Characteristics include a) easy to use, b) highly integrated, c) leverage investments, d) accommodate rapid technical change, and e) do not incur undue expense or time to build or maintain - these are not a small set of requirements. Based on our accumulated experience over the last ~ decade and several key technical approaches, we adapt, extend, and integrate several open source applications and frameworks to handle major portions of functionality for these platforms. This includes: an object-type repository, collaboration tools, identity management, all within a portal managing diverse content and applications. In this contribution, we present our methods and results of information models, adaptation, integration and evolution of a networked data science architecture based on several open source technologies (Drupal, VIVO, the Comprehensive Knowledge Archive Network; CKAN, and the Global Handle System; GHS). In particular we present the Deep Carbon Observatory - a platform for international science collaboration. We present and discuss key functional and non-functional attributes, and discuss the general applicability of the platform.
Explicit B-spline regularization in diffeomorphic image registration

PubMed Central

Tustison, Nicholas J.; Avants, Brian B.

2013-01-01

Diffeomorphic mappings are central to image registration due largely to their topological properties and success in providing biologically plausible solutions to deformation and morphological estimation problems. Popular diffeomorphic image registration algorithms include those characterized by time-varying and constant velocity fields, and symmetrical considerations. Prior information in the form of regularization is used to enforce transform plausibility taking the form of physics-based constraints or through some approximation thereof, e.g., Gaussian smoothing of the vector fields [a la Thirion's Demons (Thirion, 1998)]. In the context of the original Demons' framework, the so-called directly manipulated free-form deformation (DMFFD) (Tustison et al., 2009) can be viewed as a smoothing alternative in which explicit regularization is achieved through fast B-spline approximation. This characterization can be used to provide B-spline “flavored” diffeomorphic image registration solutions with several advantages. Implementation is open source and available through the Insight Toolkit and our Advanced Normalization Tools (ANTs) repository. A thorough comparative evaluation with the well-known SyN algorithm (Avants et al., 2008), implemented within the same framework, and its B-spline analog is performed using open labeled brain data and open source evaluation tools. PMID:24409140
Generation of openEHR Test Datasets for Benchmarking.

PubMed

El Helou, Samar; Karvonen, Tuukka; Yamamoto, Goshiro; Kume, Naoto; Kobayashi, Shinji; Kondo, Eiji; Hiragi, Shusuke; Okamoto, Kazuya; Tamura, Hiroshi; Kuroda, Tomohiro

2017-01-01

openEHR is a widely used EHR specification. Given its technology-independent nature, different approaches for implementing openEHR data repositories exist. Public openEHR datasets are needed to conduct benchmark analyses over different implementations. To address their current unavailability, we propose a method for generating openEHR test datasets that can be publicly shared and used.
Gold or green: the debate on open access policies.

PubMed

Abadal, Ernest

2013-09-01

The movement for open access to science seeks to achieve unrestricted and free access to academic publications on the Internet. To this end, two mechanisms have been established: the gold road, in which scientific journals are openly accessible, and the green road, in which publications are self-archived in repositories. The publication of the Finch Report in 2012, advocating exclusively the adoption of the gold road, generated a debate as to whether either of the two options should be prioritized. The recommendations of the Finch Report stirred controversy among academicians specialized in open access issues, who felt that the role played by repositories was not adequately considered and because the green road places the burden of publishing costs basically on authors. The Finch Report's conclusions are compatible with the characteristics of science communication in the UK and they could surely also be applied to the (few) countries with a powerful publishing industry and substantial research funding. In Spain, both the current national legislation and the existing rules at universities largely advocate the green road. This is directly related to the structure of scientific communication in Spain, where many journals have little commercial significance, the system of charging a fee to authors has not been adopted, and there is a good repository infrastructure. As for open access policies, the performance of the scientific communication system in each country should be carefully analyzed to determine the most suitable open access strategy.
Enabling FAIR and Open Data - The Importance of Communities on Influencing Change

NASA Astrophysics Data System (ADS)

Stall, S.; Lehnert, K.; Robinson, E.; Parsons, M. A.; Hanson, B.; Cutcher-Gershenfeld, J.; Nosek, B.

2017-12-01

Our research ecosystem is diverse and dependent on many interacting stakeholders that influence and support the process of science. These include funders, institutions, libraries, publishers, researchers, data managers, repositories, archives and communities. Process improvement in this ecosystem thus usually needs support by more than one of these many stakeholders. For example, mandates for open data extend across this ecosystem. Solutions require these stakeholders to come together and agree upon improvements. Recently, the value of FAIR and Open Data has encouraged funders to sponsor discussions with tangible agreements that include the steps needed to move the ecosystem towards results. Work by many of these stakeholders over the past years have developed pilot efforts that are ready to be scaled with broader engagement. A partnership of the AGU, Earth Science Information Partners (ESIP), Research Data Alliance (RDA), Center for Open Science, and key publishers including Science, Nature, and the Proceedings of the National Academy of Science (PNAS) have agreed to work together to develop integrated processes, leveraging these pilots, to make FAIR and open data the default for Earth and space science publications. This effort will build on the work of COPDESS.org, ESIP, RDA, the scientific journals, and domain repositories to ensure that well documented data, preserved in a repository with community agreed-upon metadata, and supporting persistent identifiers becomes part of the expected research products submitted in support of each publication.
Virtual Labs (Science Gateways) as platforms for Free and Open Source Science

NASA Astrophysics Data System (ADS)

Lescinsky, David; Car, Nicholas; Fraser, Ryan; Friedrich, Carsten; Kemp, Carina; Squire, Geoffrey

2016-04-01

The Free and Open Source Software (FOSS) movement promotes community engagement in software development, as well as provides access to a range of sophisticated technologies that would be prohibitively expensive if obtained commercially. However, as geoinformatics and eResearch tools and services become more dispersed, it becomes more complicated to identify and interface between the many required components. Virtual Laboratories (VLs, also known as Science Gateways) simplify the management and coordination of these components by providing a platform linking many, if not all, of the steps in particular scientific processes. These enable scientists to focus on their science, rather than the underlying supporting technologies. We describe a modular, open source, VL infrastructure that can be reconfigured to create VLs for a wide range of disciplines. Development of this infrastructure has been led by CSIRO in collaboration with Geoscience Australia and the National Computational Infrastructure (NCI) with support from the National eResearch Collaboration Tools and Resources (NeCTAR) and the Australian National Data Service (ANDS). Initially, the infrastructure was developed to support the Virtual Geophysical Laboratory (VGL), and has subsequently been repurposed to create the Virtual Hazards Impact and Risk Laboratory (VHIRL) and the reconfigured Australian National Virtual Geophysics Laboratory (ANVGL). During each step of development, new capabilities and services have been added and/or enhanced. We plan on continuing to follow this model using a shared, community code base. The VL platform facilitates transparent and reproducible science by providing access to both the data and methodologies used during scientific investigations. This is further enhanced by the ability to set up and run investigations using computational resources accessed through the VL. Data is accessed using registries pointing to catalogues within public data repositories (notably including the NCI National Environmental Research Data Interoperability Platform), or by uploading data directly from user supplied addresses or files. Similarly, scientific software is accessed through registries pointing to software repositories (e.g., GitHub). Runs are configured by using or modifying default templates designed by subject matter experts. After the appropriate computational resources are identified by the user, Virtual Machines (VMs) are spun up and jobs are submitted to service providers (currently the NeCTAR public cloud or Amazon Web Services). Following completion of the jobs the results can be reviewed and downloaded if desired. By providing a unified platform for science, the VL infrastructure enables sophisticated provenance capture and management. The source of input data (including both collection and queries), user information, software information (version and configuration details) and output information are all captured and managed as a VL resource which can be linked to output data sets. This provenance resource provides a mechanism for publication and citation for Free and Open Source Science.
ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses

PubMed Central

Stokes, Todd H; Torrance, JT; Li, Henry; Wang, May D

2008-01-01

Background A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. Results To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery. Conclusions Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at . PMID:18541053
Open Access to Physics and Astronomy Theses: A Case Study of the Raman Research Institute Digital Repository

NASA Astrophysics Data System (ADS)

Nagaraj, M. N.; Manjunath, M.; Savanur, K. P.; Sheshadri, G.

2010-10-01

With the introduction of information technology (IT) and its applications, libraries have started looking for ways to promote their institutes' research output. At the Raman Research Institute (RRI), we have showcased research output such as research papers, newspaper clippings, annual reports, technical reports, and the entire collection of C.V. Raman through the RRI digital repository, using DSpace. Recently, we have added doctoral dissertations to the repository and have made them accessible with the author's permission. In this paper, we describe the challenges and problems encountered in this project. The various stages including policy decisions, the scanning process, getting permissions, metadata standards and other related issues are described. We conclude by making a plea to other institutions also to make their theses available open-access so that this valuable information resource is accessible to all.
Oceanotron, Scalable Server for Marine Observations

NASA Astrophysics Data System (ADS)

Loubrieu, T.; Bregent, S.; Blower, J. D.; Griffiths, G.

2013-12-01

Ifremer, French marine institute, is deeply involved in data management for different ocean in-situ observation programs (ARGO, OceanSites, GOSUD, ...) or other European programs aiming at networking ocean in-situ observation data repositories (myOcean, seaDataNet, Emodnet). To capitalize the effort for implementing advance data dissemination services (visualization, download with subsetting) for these programs and generally speaking water-column observations repositories, Ifremer decided to develop the oceanotron server (2010). Knowing the diversity of data repository formats (RDBMS, netCDF, ODV, ...) and the temperamental nature of the standard interoperability interface profiles (OGC/WMS, OGC/WFS, OGC/SOS, OpeNDAP, ...), the server is designed to manage plugins: - StorageUnits : which enable to read specific data repository formats (netCDF/OceanSites, RDBMS schema, ODV binary format). - FrontDesks : which get external requests and send results for interoperable protocols (OGC/WMS, OGC/SOS, OpenDAP). In between a third type of plugin may be inserted: - TransformationUnits : which enable ocean business related transformation of the features (for example conversion of vertical coordinates from pressure in dB to meters under sea surface). The server is released under open-source license so that partners can develop their own plugins. Within MyOcean project, University of Reading has plugged a WMS implementation as an oceanotron frontdesk. The modules are connected together by sharing the same information model for marine observations (or sampling features: vertical profiles, point series and trajectories), dataset metadata and queries. The shared information model is based on OGC/Observation & Measurement and Unidata/Common Data Model initiatives. The model is implemented in java (http://www.ifremer.fr/isi/oceanotron/javadoc/). This inner-interoperability level enables to capitalize ocean business expertise in software development without being indentured to specific data formats or protocols. Oceanotron is deployed at seven European data centres for marine in-situ observations within myOcean. While additional extensions are still being developed, to promote new collaborative initiatives, a work is now done on continuous and distributed integration (jenkins, maven), shared reference documentation (on alfresco) and code and release dissemination (sourceforge, github).
Exploring a New Model for Preprint Server: A Case Study of CSPO

ERIC Educational Resources Information Center

Hu, Changping; Zhang, Yaokun; Chen, Guo

2010-01-01

This paper describes the introduction of an open-access preprint server in China covering 43 disciplines. The system includes mandatory deposit for state-funded research and reports on the repository and its effectiveness and outlines a novel process of peer-review of preprints in the repository, which can be incorporated into the established…
Leading across Boundaries: Collaborative Leadership and the Institutional Repository in Research Universities and Liberal Arts Colleges

ERIC Educational Resources Information Center

Seaman, David M.

2017-01-01

Libraries often engage in services that require collaboration across stakeholder boundaries to be successful. Institutional repositories (IRs) are a good example of such a service. IRs are an infrastructure to preserve intellectual assets within a university or college, and to provide an open access showcase for that institution's research,…
Something for Everyone? The Different Approaches of Academic Disciplines to Open Educational Resources and the Effect on Widening Participation

ERIC Educational Resources Information Center

Coughlan, Tony; Perryman, Leigh-Anne

2011-01-01

This article explores the relationship between academic disciplines' representation in the United Kingdom Open University's (OU) OpenLearn open educational resources (OER) repository and in the OU's fee-paying curriculum. Becher's (1989) typology was used to subdivide the OpenLearn and OU fee-paying curriculum content into four disciplinary…
An Open-source Community Web Site To Support Ground-Water Model Testing

NASA Astrophysics Data System (ADS)

Kraemer, S. R.; Bakker, M.; Craig, J. R.

2007-12-01

A community wiki wiki web site has been created as a resource to support ground-water model development and testing. The Groundwater Gourmet wiki is a repository for user supplied analytical and numerical recipes, howtos, and examples. Members are encouraged to submit analytical solutions, including source code and documentation. A diversity of code snippets are sought in a variety of languages, including Fortran, C, C++, Matlab, Python. In the spirit of a wiki, all contributions may be edited and altered by other users, and open source licensing is promoted. Community accepted contributions are graduated into the library of analytic solutions and organized into either a Strack (Groundwater Mechanics, 1989) or Bruggeman (Analytical Solutions of Geohydrological Problems, 1999) classification. The examples section of the wiki are meant to include laboratory experiments (e.g., Hele Shaw), classical benchmark problems (e.g., Henry Problem), and controlled field experiments (e.g., Borden landfill and Cape Cod tracer tests). Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
Taking advantage of continuity of care documents to populate a research repository.

PubMed

Klann, Jeffrey G; Mendis, Michael; Phillips, Lori C; Goodson, Alyssa P; Rocha, Beatriz H; Goldberg, Howard S; Wattanasin, Nich; Murphy, Shawn N

2015-03-01

Clinical data warehouses have accelerated clinical research, but even with available open source tools, there is a high barrier to entry due to the complexity of normalizing and importing data. The Office of the National Coordinator for Health Information Technology's Meaningful Use Incentive Program now requires that electronic health record systems produce standardized consolidated clinical document architecture (C-CDA) documents. Here, we leverage this data source to create a low volume standards based import pipeline for the Informatics for Integrating Biology and the Bedside (i2b2) clinical research platform. We validate this approach by creating a small repository at Partners Healthcare automatically from C-CDA documents. We designed an i2b2 extension to import C-CDAs into i2b2. It is extensible to other sites with variances in C-CDA format without requiring custom code. We also designed new ontology structures for querying the imported data. We implemented our methodology at Partners Healthcare, where we developed an adapter to retrieve C-CDAs from Enterprise Services. Our current implementation supports demographics, encounters, problems, and medications. We imported approximately 17 000 clinical observations on 145 patients into i2b2 in about 24 min. We were able to perform i2b2 cohort finding queries and view patient information through SMART apps on the imported data. This low volume import approach can serve small practices with local access to C-CDAs and will allow patient registries to import patient supplied C-CDAs. These components will soon be available open source on the i2b2 wiki. Our approach will lower barriers to entry in implementing i2b2 where informatics expertise or data access are limited. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Apollo: a community resource for genome annotation editing

PubMed Central

Ed, Lee; Nomi, Harris; Mark, Gibson; Raymond, Chetty; Suzanna, Lewis

2009-01-01

Summary: Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Availability: Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo. Contact: elee@berkeleybop.org PMID:19439563
Apollo: a community resource for genome annotation editing.

PubMed

Lee, Ed; Harris, Nomi; Gibson, Mark; Chetty, Raymond; Lewis, Suzanna

2009-07-15

Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo.
Software Attribution for Geoscience Applications in the Computational Infrastructure for Geodynamics

NASA Astrophysics Data System (ADS)

Hwang, L.; Dumit, J.; Fish, A.; Soito, L.; Kellogg, L. H.; Smith, M.

2015-12-01

Scientific software is largely developed by individual scientists and represents a significant intellectual contribution to the field. As the scientific culture and funding agencies move towards an expectation that software be open-source, there is a corresponding need for mechanisms to cite software, both to provide credit and recognition to developers, and to aid in discoverability of software and scientific reproducibility. We assess the geodynamic modeling community's current citation practices by examining more than 300 predominantly self-reported publications utilizing scientific software in the past 5 years that is available through the Computational Infrastructure for Geodynamics (CIG). Preliminary results indicate that authors cite and attribute software either through citing (in rank order) peer-reviewed scientific publications, a user's manual, and/or a paper describing the software code. Attributions maybe found directly in the text, in acknowledgements, in figure captions, or in footnotes. What is considered citable varies widely. Citations predominantly lack software version numbers or persistent identifiers to find the software package. Versioning may be implied through reference to a versioned user manual. Authors sometimes report code features used and whether they have modified the code. As an open-source community, CIG requests that researchers contribute their modifications to the repository. However, such modifications may not be contributed back to a repository code branch, decreasing the chances of discoverability and reproducibility. Survey results through CIG's Software Attribution for Geoscience Applications (SAGA) project suggest that lack of knowledge, tools, and workflows to cite codes are barriers to effectively implement the emerging citation norms. Generated on-demand attributions on software landing pages and a prototype extensible plug-in to automatically generate attributions in codes are the first steps towards reproducibility.
IMPLEMENTATION AND OPERATION OF THE REPOSITORY

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marcus Milling

2003-10-01

The NGDRS has facilitated 85% of cores, cuttings, and other data identified available for transfer to the public sector. Over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. To date, with industry contributions for program operations and data transfers, the NGDRS project has realized a 6.5 to 1 return on investment to Department of Energy funds. Large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale ofmore » the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghouse is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remains actively involved in working to realize the vision of the National Research Council's report of geoscience data preservation. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has been on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2003, including the discontinuation of the use of Java in future Microsoft operating systems. The recent donation of BPAmoco's Houston core facility to the Texas Bureau of Economic Geology has provided substantial short-term relief of the space constraints for public repository space.« less

BioBlend: automating pipeline analyses within Galaxy and CloudMan.

PubMed

Sloggett, Clare; Goonasekera, Nuwan; Afgan, Enis

2013-07-01

We present BioBlend, a unified API in a high-level language (python) that wraps the functionality of Galaxy and CloudMan APIs. BioBlend makes it easy for bioinformaticians to automate end-to-end large data analysis, from scratch, in a way that is highly accessible to collaborators, by allowing them to both provide the required infrastructure and automate complex analyses over large datasets within the familiar Galaxy environment. http://bioblend.readthedocs.org/. Automated installation of BioBlend is available via PyPI (e.g. pip install bioblend). Alternatively, the source code is available from the GitHub repository (https://github.com/afgane/bioblend) under the MIT open source license. The library has been tested and is working on Linux, Macintosh and Windows-based systems.
The discounting model selector: Statistical software for delay discounting applications.

PubMed

Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A

2017-05-01

Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
dsmcFoam+: An OpenFOAM based direct simulation Monte Carlo solver

NASA Astrophysics Data System (ADS)

White, C.; Borg, M. K.; Scanlon, T. J.; Longshaw, S. M.; John, B.; Emerson, D. R.; Reese, J. M.

2018-03-01

dsmcFoam+ is a direct simulation Monte Carlo (DSMC) solver for rarefied gas dynamics, implemented within the OpenFOAM software framework, and parallelised with MPI. It is open-source and released under the GNU General Public License in a publicly available software repository that includes detailed documentation and tutorial DSMC gas flow cases. This release of the code includes many features not found in standard dsmcFoam, such as molecular vibrational and electronic energy modes, chemical reactions, and subsonic pressure boundary conditions. Since dsmcFoam+ is designed entirely within OpenFOAM's C++ object-oriented framework, it benefits from a number of key features: the code emphasises extensibility and flexibility so it is aimed first and foremost as a research tool for DSMC, allowing new models and test cases to be developed and tested rapidly. All DSMC cases are as straightforward as setting up any standard OpenFOAM case, as dsmcFoam+ relies upon the standard OpenFOAM dictionary based directory structure. This ensures that useful pre- and post-processing capabilities provided by OpenFOAM remain available even though the fully Lagrangian nature of a DSMC simulation is not typical of most OpenFOAM applications. We show that dsmcFoam+ compares well to other well-known DSMC codes and to analytical solutions in terms of benchmark results.
Creation of Reusable Open Textbooks: Insights from the Connexions Repository

ERIC Educational Resources Information Center

Rodriguez-Solano, Carlos; Sánchez-Alonso, Salvador; Sicilia, Miguel-Angel

2015-01-01

Open textbook initiatives have appeared as an alternative to traditional publishing. These initiatives for the production of alternatively copyrighted educational resources provide a way of sharing materials through the Web. While the open model of peer-produced materials enables the global reuse of textbooks, the combination of fragments to…
The SpeX Prism Library for Ultracool Dwarfs: A Resource for Stellar, Exoplanet and Galactic Science and Student-Led Research

NASA Astrophysics Data System (ADS)

Burgasser, Adam

The NASA Infrared Telescope Facility's (IRTF) SpeX spectrograph has been an essential tool in the discovery and characterization of ultracool dwarf (UCD) stars, brown dwarfs and exoplanets. Over ten years of SpeX data have been collected on these sources, and a repository of low-resolution (R 100) SpeX prism spectra has been maintained by the PI at the SpeX Prism Spectral Libraries website since 2008. As the largest existing collection of NIR UCD spectra, this repository has facilitated a broad range of investigations in UCD, exoplanet, Galactic and extragalactic science, contributing to over 100 publications in the past 6 years. However, this repository remains highly incomplete, has not been uniformly calibrated, lacks sufficient contextual data for observations and sources, and most importantly provides no data visualization or analysis tools for the user. To fully realize the scientific potential of these data for community research, we propose a two-year program to (1) calibrate and expand existing repository and archival data, and make it virtual-observatory compliant; (2) serve the data through a searchable web archive with basic visualization tools; and (3) develop and distribute an open-source, Python-based analysis toolkit for users to analyze the data. These resources will be generated through an innovative, student-centered research model, with undergraduate and graduate students building and validating the analysis tools through carefully designed coding challenges and research validation activities. The resulting data archive, the SpeX Prism Library, will be a legacy resource for IRTF and SpeX, and will facilitate numerous investigations using current and future NASA capabilities. These include deep/wide surveys of UCDs to measure Galactic structure and chemical evolution, and probe UCD populations in satellite galaxies (e.g., JWST, WFIRST); characterization of directly imaged exoplanet spectra (e.g., FINESSE), and development of low-temperature theoretical models of UCD and exoplanet atmospheres. Our program will also serve to validate the IRTF data archive during its development, by reducing and disseminating non-proprietary archival observations of UCDs to the community. The proposed program directly addresses NASA's strategic goals of exploring the origin and evolution of stars and planets that make up our universe, and discovering and studying planets around other stars.
Data Sharing in Astrobiology: the Astrobiology Habitable Environments Database (AHED)

NASA Astrophysics Data System (ADS)

Bristow, T.; Lafuente Valverde, B.; Keller, R.; Stone, N.; Downs, R. T.; Blake, D. F.; Fonda, M.; Pires, A.

2016-12-01

Astrobiology is a multidisciplinary area of scientific research focused on studying the origins of life on Earth and the conditions under which life might have emerged elsewhere in the universe. The understanding of complex questions in astrobiology requires integration and analysis of data spanning a range of disciplines including biology, chemistry, geology, astronomy and planetary science. However, the lack of a centralized repository makes it difficult for astrobiology teams to share data and benefit from resultant synergies. Moreover, in recent years, federal agencies are requiring that results of any federally funded scientific research must be available and useful for the public and the science community. Astrobiology, as any other scientific discipline, needs to respond to these mandates. The Astrobiology Habitable Environments Database (AHED) is a central, high quality, long-term searchable repository designed to help the community by promoting the integration and sharing of all the data generated by these diverse disciplines. AHED provides public and open-access to astrobiology-related research data through a user-managed web portal implemented using the open-source software The Open Data Repository's (ODR) Data Publisher [1]. ODR-DP provides a user-friendly interface that research teams or individual scientists can use to design, populate and manage their own databases or laboratory notebooks according to the characteristics of their data. AHED is then a collection of databases housed in the ODR framework that store information about samples, along with associated measurements, analyses, and contextual information about field sites where samples were collected, the instruments or equipment used for analysis, and people and institutions involved in their collection. Advanced graphics are implemented together with advanced online tools for data analysis (e.g. R, MATLAB, Project Jupyter-http://jupyter.org). A permissions system will be put in place so that as data are being actively collected and interpreted, they will remain proprietary. A citation system will allow research data to be used and appropriately referenced by other researchers after the data are made public. This project is supported by SERA and NASA NNX11AP82A, MSL. [1] Stone et al. (2016) AGU, submitted.
Semantic Web repositories for genomics data using the eXframe platform

PubMed Central

2014-01-01

Background With the advent of inexpensive assay technologies, there has been an unprecedented growth in genomics data as well as the number of databases in which it is stored. In these databases, sample annotation using ontologies and controlled vocabularies is becoming more common. However, the annotation is rarely available as Linked Data, in a machine-readable format, or for standardized queries using SPARQL. This makes large-scale reuse, or integration with other knowledge bases very difficult. Methods To address this challenge, we have developed the second generation of our eXframe platform, a reusable framework for creating online repositories of genomics experiments. This second generation model now publishes Semantic Web data. To accomplish this, we created an experiment model that covers provenance, citations, external links, assays, biomaterials used in the experiment, and the data collected during the process. The elements of our model are mapped to classes and properties from various established biomedical ontologies. Resource Description Framework (RDF) data is automatically produced using these mappings and indexed in an RDF store with a built-in Sparql Protocol and RDF Query Language (SPARQL) endpoint. Conclusions Using the open-source eXframe software, institutions and laboratories can create Semantic Web repositories of their experiments, integrate it with heterogeneous resources and make it interoperable with the vast Semantic Web of biomedical knowledge. PMID:25093072
Linking Big and Small Data Across the Social, Engineering, and Earth Sciences

NASA Astrophysics Data System (ADS)

Chen, R. S.; de Sherbinin, A. M.; Levy, M. A.; Downs, R. R.

2014-12-01

The challenges of sustainable development cut across the social, health, ecological, engineering, and Earth sciences, across a wide range of spatial and temporal scales, and across the spectrum from basic to applied research and decision making. The rapidly increasing availability of data and information in digital form from a variety of data repositories, networks, and other sources provides new opportunities to link and integrate both traditional data holdings as well as emerging "big data" resources in ways that enable interdisciplinary research and facilitate the use of objective scientific data and information in society. Taking advantage of these opportunities not only requires improved technical and scientific data interoperability across disciplines, scales, and data types, but also concerted efforts to bridge gaps and barriers between key communities, institutions, and networks. Given the long time perspectives required in planning sustainable approaches to development, it is also imperative to address user requirements for long-term data continuity and stewardship by trustworthy repositories. We report here on lessons learned by CIESIN working on a range of sustainable development issues to integrate data across multiple repositories and networks. This includes CIESIN's roles in developing policy-relevant climate and environmental indicators, soil data for African agriculture, and exposure and risk measures for hazards, disease, and conflict, as well as CIESIN's participation in a range of national and international initiatives related both to sustainable development and to open data access, interoperability, and stewardship.
Dynamic federations: storage aggregation using open tools and protocols

NASA Astrophysics Data System (ADS)

Furano, Fabrizio; Brito da Rocha, Ricardo; Devresse, Adrien; Keeble, Oliver; Álvarez Ayllón, Alejandro; Fuhrmann, Patrick

2012-12-01

A number of storage elements now offer standard protocol interfaces like NFS 4.1/pNFS and WebDAV, for access to their data repositories, in line with the standardization effort of the European Middleware Initiative (EMI). Also the LCG FileCatalogue (LFC) can offer such features. Here we report on work that seeks to exploit the federation potential of these protocols and build a system that offers a unique view of the storage and metadata ensemble and the possibility of integration of other compatible resources such as those from cloud providers. The challenge, here undertaken by the providers of dCache and DPM, and pragmatically open to other Grid and Cloud storage solutions, is to build such a system while being able to accommodate name translations from existing catalogues (e.g. LFCs), experiment-based metadata catalogues, or stateless algorithmic name translations, also known as “trivial file catalogues”. Such so-called storage federations of standard protocols-based storage elements give a unique view of their content, thus promoting simplicity in accessing the data they contain and offering new possibilities for resilience and data placement strategies. The goal is to consider HTTP and NFS4.1-based storage elements and metadata catalogues and make them able to cooperate through an architecture that properly feeds the redirection mechanisms that they are based upon, thus giving the functionalities of a “loosely coupled” storage federation. One of the key requirements is to use standard clients (provided by OS'es or open source distributions, e.g. Web browsers) to access an already aggregated system; this approach is quite different from aggregating the repositories at the client side through some wrapper API, like for instance GFAL, or by developing new custom clients. Other technical challenges that will determine the success of this initiative include performance, latency and scalability, and the ability to create worldwide storage federations that are able to redirect clients to repositories that they can efficiently access, for instance trying to choose the endpoints that are closer or applying other criteria. We believe that the features of a loosely coupled federation of open-protocols-based storage elements will open many possibilities of evolving the current computing models without disrupting them, and, at the same time, will be able to operate with the existing infrastructures, follow their evolution path and add storage centers that can be acquired as a third-party service.
Coupling fuel cycles with repositories: how repository institutional choices may impact fuel cycle design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Forsberg, C.; Miller, W.F.

2013-07-01

The historical repository siting strategy in the United States has been a top-down approach driven by federal government decision making but it has been a failure. This policy has led to dispatching fuel cycle facilities in different states. The U.S. government is now considering an alternative repository siting strategy based on voluntary agreements with state governments. If that occurs, state governments become key decision makers. They have different priorities. Those priorities may change the characteristics of the repository and the fuel cycle. State government priorities, when considering hosting a repository, are safety, financial incentives and jobs. It follows that statesmore » will demand that a repository be the center of the back end of the fuel cycle as a condition of hosting it. For example, states will push for collocation of transportation services, safeguards training, and navy/private SNF (Spent Nuclear Fuel) inspection at the repository site. Such activities would more than double local employment relative to what was planned for the Yucca Mountain-type repository. States may demand (1) the right to take future title of the SNF so if recycle became economic the reprocessing plant would be built at the repository site and (2) the right of a certain fraction of the repository capacity for foreign SNF. That would open the future option of leasing of fuel to foreign utilities with disposal of the SNF in the repository but with the state-government condition that the front-end fuel-cycle enrichment and fuel fabrication facilities be located in that state.« less
WIRM: An Open Source Toolkit for Building Biomedical Web Applications

PubMed Central

Jakobovits, Rex M.; Rosse, Cornelius; Brinkley, James F.

2002-01-01

This article describes an innovative software toolkit that allows the creation of web applications that facilitate the acquisition, integration, and dissemination of multimedia biomedical data over the web, thereby reducing the cost of knowledge sharing. There is a lack of high-level web application development tools suitable for use by researchers, clinicians, and educators who are not skilled programmers. Our Web Interfacing Repository Manager (WIRM) is a software toolkit that reduces the complexity of building custom biomedical web applications. WIRM’s visual modeling tools enable domain experts to describe the structure of their knowledge, from which WIRM automatically generates full-featured, customizable content management systems. PMID:12386108
OpenID Connect as a security service in cloud-based medical imaging systems

PubMed Central

Ma, Weina; Sartipi, Kamran; Sharghigoorabi, Hassan; Koff, David; Bak, Peter

2016-01-01

Abstract. The evolution of cloud computing is driving the next generation of medical imaging systems. However, privacy and security concerns have been consistently regarded as the major obstacles for adoption of cloud computing by healthcare domains. OpenID Connect, combining OpenID and OAuth together, is an emerging representational state transfer-based federated identity solution. It is one of the most adopted open standards to potentially become the de facto standard for securing cloud computing and mobile applications, which is also regarded as “Kerberos of cloud.” We introduce OpenID Connect as an authentication and authorization service in cloud-based diagnostic imaging (DI) systems, and propose enhancements that allow for incorporating this technology within distributed enterprise environments. The objective of this study is to offer solutions for secure sharing of medical images among diagnostic imaging repository (DI-r) and heterogeneous picture archiving and communication systems (PACS) as well as Web-based and mobile clients in the cloud ecosystem. The main objective is to use OpenID Connect open-source single sign-on and authorization service and in a user-centric manner, while deploying DI-r and PACS to private or community clouds should provide equivalent security levels to traditional computing model. PMID:27340682
ASK-LDT 2.0: A Web-Based Graphical Tool for Authoring Learning Designs

ERIC Educational Resources Information Center

Zervas, Panagiotis; Fragkos, Konstantinos; Sampson, Demetrios G.

2013-01-01

During the last decade, Open Educational Resources (OERs) have gained increased attention for their potential to support open access, sharing and reuse of digital educational resources. Therefore, a large amount of digital educational resources have become available worldwide through web-based open access repositories which are referred to as…
Frame of Reference: Open Access Starts with You

ERIC Educational Resources Information Center

Goetsch, Lori A.

2010-01-01

Federal legislation now requires the deposit of some taxpayer-funded research in "open-access" repositories--that is, sites where scholarship and research are made freely available over the Internet. The institutions whose faculty produce the research have begun to see the benefit of open-access publication as well. From the perspective of faculty…
78 FR 28111 - Making Open and Machine Readable the New Default for Government Information

Federal Register 2010, 2011, 2012, 2013, 2014

2013-05-14

... warning systems, location-based applications, precision farming tools, and much more, improving Americans... repository of tools and best practices to assist agencies in integrating the Open Data Policy into their... needed to ensure it remains a resource to facilitate the adoption of open data practices. (b) Within 90...
The Experiment Factory: Standardizing Behavioral Experiments.

PubMed

Sochat, Vanessa V; Eisenberg, Ian W; Enkavi, A Zeynep; Li, Jamie; Bissett, Patrick G; Poldrack, Russell A

2016-01-01

The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms.
The Experiment Factory: Standardizing Behavioral Experiments

PubMed Central

Sochat, Vanessa V.; Eisenberg, Ian W.; Enkavi, A. Zeynep; Li, Jamie; Bissett, Patrick G.; Poldrack, Russell A.

2016-01-01

The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms. PMID:27199843
The NIH BD2K center for big data in translational genomics.

PubMed

Paten, Benedict; Diekhans, Mark; Druker, Brian J; Friend, Stephen; Guinney, Justin; Gassner, Nadine; Guttman, Mitchell; Kent, W James; Mantey, Patrick; Margolin, Adam A; Massie, Matt; Novak, Adam M; Nothaft, Frank; Pachter, Lior; Patterson, David; Smuga-Otto, Maciej; Stuart, Joshua M; Van't Veer, Laura; Wold, Barbara; Haussler, David

2015-11-01

The world's genomics data will never be stored in a single repository - rather, it will be distributed among many sites in many countries. No one site will have enough data to explain genotype to phenotype relationships in rare diseases; therefore, sites must share data. To accomplish this, the genetics community must forge common standards and protocols to make sharing and computing data among many sites a seamless activity. Through the Global Alliance for Genomics and Health, we are pioneering the development of shared application programming interfaces (APIs) to connect the world's genome repositories. In parallel, we are developing an open source software stack (ADAM) that uses these APIs. This combination will create a cohesive genome informatics ecosystem. Using containers, we are facilitating the deployment of this software in a diverse array of environments. Through benchmarking efforts and big data driver projects, we are ensuring ADAM's performance and utility. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Building Energy Management Open Source Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

This is the repository for Building Energy Management Open Source Software (BEMOSS), which is an open source operating system that is engineered to improve sensing and control of equipment in small- and medium-sized commercial buildings. BEMOSS offers the following key features: (1) Open source, open architecture – BEMOSS is an open source operating system that is built upon VOLTTRON – a distributed agent platform developed by Pacific Northwest National Laboratory (PNNL). BEMOSS was designed to make it easy for hardware manufacturers to seamlessly interface their devices with BEMOSS. Software developers can also contribute to adding additional BEMOSS functionalities and applications.more » (2) Plug & play – BEMOSS was designed to automatically discover supported load controllers (including smart thermostats, VAV/RTUs, lighting load controllers and plug load controllers) in commercial buildings. (3) Interoperability – BEMOSS was designed to work with load control devices form different manufacturers that operate on different communication technologies and data exchange protocols. (4) Cost effectiveness – Implementation of BEMOSS deemed to be cost-effective as it was built upon a robust open source platform that can operate on a low-cost single-board computer, such as Odroid. This feature could contribute to its rapid deployment in small- or medium-sized commercial buildings. (5) Scalability and ease of deployment – With its multi-node architecture, BEMOSS provides a distributed architecture where load controllers in a multi-floor and high occupancy building could be monitored and controlled by multiple single-board computers hosting BEMOSS. This makes it possible for a building engineer to deploy BEMOSS in one zone of a building, be comfortable with its operation, and later on expand the deployment to the entire building to make it more energy efficient. (6) Ability to provide local and remote monitoring – BEMOSS provides both local and remote monitoring ability with role-based access control. (7) Security – In addition to built-in security features provided by VOLTTRON, BEMOSS provides enhanced security features, including BEMOSS discovery approval process, encrypted core-to-node communication, thermostat anti-tampering feature and many more. (8) Support from the Advisory Committee – BEMOSS was developed in consultation with an advisory committee from the beginning of the project. BEMOSS advisory committee comprises representatives from 22 organizations from government and industry.« less
Use of a Knowledge Management System in Waste Management Projects

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gruendler, D.; Boetsch, W.U.; Holzhauer, U.

2006-07-01

In Germany the knowledge management system 'WasteInfo' about waste management and disposal issues has been developed and implemented. Beneficiaries of 'WasteInfo' are official decision makers having access to a large information pool. The information pool is fed by experts, so called authors This means compiling of information, evaluation and assigning of appropriate properties (metadata) to this information. The knowledge management system 'WasteInfo' has been introduced at the WM04, the operation of 'WasteInfo' at the WM05. The recent contribution describes the additional advantage of the KMS being used as a tool for the dealing with waste management projects. This specific aspectmore » will be demonstrated using a project concerning a comparative analysis of the implementation of repositories in six countries using nuclear power as examples: The information of 'WasteInfo' is assigned to categories and structured according to its origin and type of publication. To use 'WasteInfo' as a tool for the processing the projects, a suitable set of categories has to be developed for each project. Apart from technical and scientific aspects, the selected project deals with repository strategies and policies in various countries, with the roles of applicants and authorities in licensing procedures, with safety philosophy and with socio-economic concerns. This new point of view has to be modelled in the categories. Similar to this, new sources of information such as local and regional dailies or particular web-sites have to be taken into consideration. In this way 'WasteInfo' represents an open document which reflects the current status of the respective repository policy in several countries. Information with particular meaning for the German repository planning is marked and by this may influence the German strategy. (authors)« less

ORIOLE, in the Search for Evidence of OER in Teaching. Experiences in the Use, Re-Use and the Sharing and Influence of Repositories

ERIC Educational Resources Information Center

Santos-Hermosa, Gema

2014-01-01

The study presented here aims to gather useful information on the use, re-reuse and sharing of resources in Education and also the influence of repositories, to better understand the perspective of individual practitioners and suggest future areas of debate for researchers. Open Resources: Influence on Learners and Educators (ORIOLE) project, was…
gemcWeb: A Cloud Based Nuclear Physics Simulation Software

NASA Astrophysics Data System (ADS)

Markelon, Sam

2017-09-01

gemcWeb allows users to run nuclear physics simulations from the web. Being completely device agnostic, scientists can run simulations from anywhere with an Internet connection. Having a full user system, gemcWeb allows users to revisit and revise their projects, and share configurations and results with collaborators. gemcWeb is based on simulation software gemc, which is based on standard GEant4. gemcWeb requires no C++, gemc, or GEant4 knowledge. Using a simple but powerful GUI allows users to configure their project from geometries and configurations stored on the deployment server. Simulations are then run on the server, with results being posted to the user, and then securely stored. Python based and open-source, the main version of gemcWeb is hosted internally at Jefferson National Labratory and used by the CLAS12 and Electron-Ion Collider Project groups. However, as the software is open-source, and hosted as a GitHub repository, an instance can be deployed on the open web, or any institution's intra-net. An instance can be configured to host experiments specific to an institution, and the code base can be modified by any individual or group. Special thanks to: Maurizio Ungaro, PhD., creator of gemc; Markus Diefenthaler, PhD., advisor; and Kyungseon Joo, PhD., advisor.
AdaNET phase 0 support for the AdaNET Dynamic Software Inventory (DSI) management system prototype. Catalog of available reusable software components

NASA Technical Reports Server (NTRS)

Hanley, Lionel

1989-01-01

The Ada Software Repository is a public-domain collection of Ada software and information. The Ada Software Repository is one of several repositories located on the SIMTEL20 Defense Data Network host computer at White Sands Missile Range, and available to any host computer on the network since 26 November 1984. This repository provides a free source for Ada programs and information. The Ada Software Repository is divided into several subdirectories. These directories are organized by topic, and their names and a brief overview of their topics are contained. The Ada Software Repository on SIMTEL20 serves two basic roles: to promote the exchange and use (reusability) of Ada programs and tools (including components) and to promote Ada education.
BIRS - Bioterrorism Information Retrieval System.

PubMed

Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

2013-01-01

Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. The database is freely available at http://www.bioterrorism.biowaves.org.
Protocols for Scholarly Communication

NASA Astrophysics Data System (ADS)

Pepe, A.; Yeomans, J.

2007-10-01

CERN, the European Organization for Nuclear Research, has operated an institutional preprint repository for more than 10 years. The repository contains over 850,000 records of which more than 450,000 are full-text OA preprints, mostly in the field of particle physics, and it is integrated with the library's holdings of books, conference proceedings, journals and other grey literature. In order to encourage effective propagation and open access to scholarly material, CERN is implementing a range of innovative library services into its document repository: automatic keywording, reference extraction, collaborative management tools and bibliometric tools. Some of these services, such as user reviewing and automatic metadata extraction, could make up an interesting testbed for future publishing solutions and certainly provide an exciting environment for e-science possibilities. The future protocol for scientific communication should guide authors naturally towards OA publication, and CERN wants to help reach a full open access publishing environment for the particle physics community and related sciences in the next few years.
Connecting the pieces: Using ORCIDs to improve research impact and repositories.

PubMed

Baessa, Mohamed; Lery, Thibaut; Grenz, Daryl; Vijayakumar, J K

2015-01-01

Quantitative data are crucial in the assessment of research impact in the academic world. However, as a young university created in 2009, King Abdullah University of Science and Technology (KAUST) needs to aggregate bibliometrics from researchers coming from diverse origins, not necessarily with the proper affiliations. In this context, the University has launched an institutional repository in September 2012 with the objectives of creating a home for the intellectual outputs of KAUST researchers. Later, the university adopted the first mandated institutional open access policy in the Arab region, effective June 31, 2014. Several projects were then initiated in order to accurately identify the research being done by KAUST authors and bring it into the repository in accordance with the open access policy. Integration with ORCID has been a key element in this process and the best way to ensure data quality for researcher's scientific contributions. It included the systematic inclusion and creation, if necessary, of ORCID identifiers in the existing repository system, an institutional membership in ORCID, and the creation of dedicated integration tools. In addition and in cooperation with the Office of Research Evaluation, the Library worked at implementing a Current Research Information System (CRIS) as a standardized common resource to monitor KAUST research outputs. We will present our findings about the CRIS implementation, the ORCID API, the repository statistics as well as our approach in conducting the assessment of research impact in terms of usage by the global research community.
Source term evaluation model for high-level radioactive waste repository with decay chain build-up.

PubMed

Chopra, Manish; Sunny, Faby; Oza, R B

2016-09-18

A source term model based on two-component leach flux concept is developed for a high-level radioactive waste repository. The long-lived radionuclides associated with high-level waste may give rise to the build-up of activity because of radioactive decay chains. The ingrowths of progeny are incorporated in the model using Bateman decay chain build-up equations. The model is applied to different radionuclides present in the high-level radioactive waste, which form a part of decay chains (4n to 4n + 3 series), and the activity of the parent and daughter radionuclides leaching out of the waste matrix is estimated. Two cases are considered: one when only parent is present initially in the waste and another where daughters are also initially present in the waste matrix. The incorporation of in situ production of daughter radionuclides in the source is important to carry out realistic estimates. It is shown that the inclusion of decay chain build-up is essential to avoid underestimation of the radiological impact assessment of the repository. The model can be a useful tool for evaluating the source term of the radionuclide transport models used for the radiological impact assessment of high-level radioactive waste repositories.
Repository contributions to Rubus research

USDA-ARS?s Scientific Manuscript database

The USDA National Plant Germplasm System is a nation-wide source for global genetic resources. The National Clonal Germplasm Repository (NCGR) in Corvallis, OR, maintains crops and crop wild relatives for the Willamette Valley including pear, raspberry and blackberry, strawberry, blueberry, gooseber...
The state and profile of open source software projects in health and medical informatics.

PubMed

Janamanchi, Balaji; Katsamakas, Evangelos; Raghupathi, Wullianallur; Gao, Wei

2009-07-01

Little has been published about the application profiles and development patterns of open source software (OSS) in health and medical informatics. This study explores these issues with an analysis of health and medical informatics related OSS projects on SourceForge, a large repository of open source projects. A search was conducted on the SourceForge website during the period from May 1 to 15, 2007, to identify health and medical informatics OSS projects. This search resulted in a sample of 174 projects. A Java-based parser was written to extract data for several of the key variables of each project. Several visually descriptive statistics were generated to analyze the profiles of the OSS projects. Many of the projects have sponsors, implying a growing interest in OSS among organizations. Sponsorship, we discovered, has a significant impact on project success metrics. Nearly two-thirds of the projects have a restrictive license type. Restrictive licensing may indicate tighter control over the development process. Our sample includes a wide range of projects that are at various stages of development (status). Projects targeted towards the advanced end user are primarily focused on bio-informatics, data formats, database and medical science applications. We conclude that there exists an active and thriving OSS development community that is focusing on health and medical informatics. A wide range of OSS applications are in development, from bio-informatics to hospital information systems. A profile of OSS in health and medical informatics emerges that is distinct and unique to the health care field. Future research can focus on OSS acceptance and diffusion and impact on cost, efficiency and quality of health care.
OERScout Technology Framework: A Novel Approach to Open Educational Resources Search

ERIC Educational Resources Information Center

Abeywardena, Ishan Sudeera; Chan, Chee Seng; Tham, Choy Yoong

2013-01-01

The open educational resources (OER) movement has gained momentum in the past few years. With this new drive towards making knowledge open and accessible, a large number of OER repositories have been established and made available online throughout the world. However, the inability of existing search engines such as Google, Yahoo!, and Bing to…
Acquiring Data by Mining the Past: Pairing Communities with Environmental Monitoring Methods through Open Online Collaborative Replication

NASA Astrophysics Data System (ADS)

Lippincott, M.; Lewis, E. S.; Gehrke, G. E.; Wise, A.; Pyle, S.; Sinatra, V.; Bland, G.; Bydlowski, D.; Henry, A.; Gilberts, P. A.

2016-12-01

Community groups are interested in low-cost sensors to monitor their environment. However, many new commercial sensors are unknown devices without peer-reviewed evaluations of data quality or pathways to regulatory acceptance, and the time to achieve these outcomes may be beyond a community's patience and attention. Rather than developing a device from scratch or validating a new commercial product, a workflow is presented whereby existing technologies, especially those that are out of patent, are replicated through open online collaboration between communities affected by environmental pollution, volunteers, academic institutions, and existing open hardware and open source software projects. Technology case studies will be presented, focusing primarily on a passive PM monitor based on the UNC Passive Monitor. Stages of the project will be detailed moving from identifying community needs, reviewing existing technology, partnership development, technology replication, IP review and licensing, data quality assurance (in process), and field evaluation with community partners (in process), with special attention to partnership development and technology review. We have leveraged open hardware and open source software to lower the cost and access barriers of existing technologies for PM10-2.5 and other atmospheric measures that have already been validated through peer review. Existing validation of and regulatory familiarity with a technology enables a rapid pathway towards collecting data, shortening the time it takes for communities to leverage data in environmental management decisions. Online collaboration requires rigorous documentation that aids in spreading research methods and promoting deep engagement by interested community researchers outside academia. At the same time, careful choice of technology and the use of small-scale fabrication through laser cutting, 3D printing, and open, shared repositories of plans and software enables educational engagement that broadens a project's reach.
PGP repository: a plant phenomics and genomics data publication infrastructure

PubMed Central

Arend, Daniel; Junker, Astrid; Scholz, Uwe; Schüler, Danuta; Wylie, Juliane; Lange, Matthias

2016-01-01

Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable, open-access portal. Although several international consortia and public resource centres offer services for plant research data management, valuable digital assets remains unpublished and thus inaccessible to the scientific community. Recently, the Leibniz Institute of Plant Genetics and Crop Plant Research and the German Plant Phenotyping Network have jointly initiated the Plant Genomics and Phenomics Research Data Repository (PGP) as infrastructure to comprehensively publish plant research data. This covers in particular cross-domain datasets that are not being published in central repositories because of its volume or unsupported data scope, like image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents. The repository is hosted at Leibniz Institute of Plant Genetics and Crop Plant Research using e!DAL as software infrastructure and a Hierarchical Storage Management System as data archival backend. A novel developed data submission tool was made available for the consortium that features a high level of automation to lower the barriers of data publication. After an internal review process, data are published as citable digital object identifiers and a core set of technical metadata is registered at DataCite. The used e!DAL-embedded Web frontend generates for each dataset a landing page and supports an interactive exploration. PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive. Above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles—findable, accessible, interoperable, reusable. Database URL: http://edal.ipk-gatersleben.de/repos/pgp/ PMID:27087305
Integrating semantic dimension into openEHR archetypes for the management of cerebral palsy electronic medical records.

PubMed

Ellouze, Afef Samet; Bouaziz, Rafik; Ghorbel, Hanen

2016-10-01

Integrating semantic dimension into clinical archetypes is necessary once modeling medical records. First, it enables semantic interoperability and, it offers applying semantic activities on clinical data and provides a higher design quality of Electronic Medical Record (EMR) systems. However, to obtain these advantages, designers need to use archetypes that cover semantic features of clinical concepts involved in their specific applications. In fact, most of archetypes filed within open repositories are expressed in the Archetype Definition Language (ALD) which allows defining only the syntactic structure of clinical concepts weakening semantic activities on the EMR content in the semantic web environment. This paper focuses on the modeling of an EMR prototype for infants affected by Cerebral Palsy (CP), using the dual model approach and integrating semantic web technologies. Such a modeling provides a better delivery of quality of care and ensures semantic interoperability between all involved therapies' information systems. First, data to be documented are identified and collected from the involved therapies. Subsequently, data are analyzed and arranged into archetypes expressed in accordance of ADL. During this step, open archetype repositories are explored, in order to find the suitable archetypes. Then, ADL archetypes are transformed into archetypes expressed in OWL-DL (Ontology Web Language - Description Language). Finally, we construct an ontological source related to these archetypes enabling hence their annotation to facilitate data extraction and providing possibility to exercise semantic activities on such archetypes. Semantic dimension integration into EMR modeled in accordance to the archetype approach. The feasibility of our solution is shown through the development of a prototype, baptized "CP-SMS", which ensures semantic exploitation of CP EMR. This prototype provides the following features: (i) creation of CP EMR instances and their checking by using a knowledge base which we have constructed by interviews with domain experts, (ii) translation of initially CP ADL archetypes into CP OWL-DL archetypes, (iii) creation of an ontological source which we can use to annotate obtained archetypes and (vi) enrichment and supply of the ontological source and integration of semantic relations by providing hence fueling the ontology with new concepts, ensuring consistency and eliminating ambiguity between concepts. The degree of semantic interoperability that could be reached between EMR systems depends strongly on the quality of the used archetypes. Thus, the integration of semantic dimension in archetypes modeling process is crucial. By creating an ontological source and annotating archetypes, we create a supportive platform ensuring semantic interoperability between archetypes-based EMR-systems. Copyright © 2016. Published by Elsevier Inc.
JHelioviewer: Open-Source Software for Discovery and Image Access in the Petabyte Age (Invited)

NASA Astrophysics Data System (ADS)

Mueller, D.; Dimitoglou, G.; Langenberg, M.; Pagel, S.; Dau, A.; Nuhn, M.; Garcia Ortiz, J. P.; Dietert, H.; Schmidt, L.; Hughitt, V. K.; Ireland, J.; Fleck, B.

2010-12-01

The unprecedented torrent of data returned by the Solar Dynamics Observatory is both a blessing and a barrier: a blessing for making available data with significantly higher spatial and temporal resolution, but a barrier for scientists to access, browse and analyze them. With such staggering data volume, the data is bound to be accessible only from a few repositories and users will have to deal with data sets effectively immobile and practically difficult to download. From a scientist's perspective this poses three challenges: accessing, browsing and finding interesting data while avoiding the proverbial search for a needle in a haystack. To address these challenges, we have developed JHelioviewer, an open-source visualization software that lets users browse large data volumes both as still images and movies. We did so by deploying an efficient image encoding, storage, and dissemination solution using the JPEG 2000 standard. This solution enables users to access remote images at different resolution levels as a single data stream. Users can view, manipulate, pan, zoom, and overlay JPEG 2000 compressed data quickly, without severe network bandwidth penalties. Besides viewing data, the browser provides third-party metadata and event catalog integration to quickly locate data of interest, as well as an interface to the Virtual Solar Observatory to download science-quality data. As part of the Helioviewer Project, JHelioviewer offers intuitive ways to browse large amounts of heterogeneous data remotely and provides an extensible and customizable open-source platform for the scientific community.
eNOSHA, a Free, Open and Flexible Learning Object Repository--An Iterative Development Process for Global User-Friendliness

ERIC Educational Resources Information Center

Mozelius, Peter; Hettiarachchi, Enosha

2012-01-01

This paper describes the iterative development process of a Learning Object Repository (LOR), named eNOSHA. Discussions on a project for a LOR started at the e-Learning Centre (eLC) at The University of Colombo, School of Computing (UCSC) in 2007. The eLC has during the last decade been developing learning content for a nationwide e-learning…
ACToR Chemical Structure processing using Open Source ...

EPA Pesticide Factsheets

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d
Analyzing Hidden Semantics in Social Bookmarking of Open Educational Resources

NASA Astrophysics Data System (ADS)

Minguillón, Julià

Web 2.0 services such as social bookmarking allow users to manage and share the links they find interesting, adding their own tags for describing them. This is especially interesting in the field of open educational resources, as delicious is a simple way to bridge the institutional point of view (i.e. learning object repositories) with the individual one (i.e. personal collections), thus promoting the discovering and sharing of such resources by other users. In this paper we propose a methodology for analyzing such tags in order to discover hidden semantics (i.e. taxonomies and vocabularies) that can be used to improve descriptions of learning objects and make learning object repositories more visible and discoverable. We propose the use of a simple statistical analysis tool such as principal component analysis to discover which tags create clusters that can be semantically interpreted. We will compare the obtained results with a collection of resources related to open educational resources, in order to better understand the real needs of people searching for open educational resources.
Assuring the Quality of Agricultural Learning Repositories: Issues for the Learning Object Metadata Creation Process of the CGIAR

NASA Astrophysics Data System (ADS)

Zschocke, Thomas; Beniest, Jan

The Consultative Group on International Agricultural Re- search (CGIAR) has established a digital repository to share its teaching and learning resources along with descriptive educational information based on the IEEE Learning Object Metadata (LOM) standard. As a critical component of any digital repository, quality metadata are critical not only to enable users to find more easily the resources they require, but also for the operation and interoperability of the repository itself. Studies show that repositories have difficulties in obtaining good quality metadata from their contributors, especially when this process involves many different stakeholders as is the case with the CGIAR as an international organization. To address this issue the CGIAR began investigating the Open ECBCheck as well as the ISO/IEC 19796-1 standard to establish quality protocols for its training. The paper highlights the implications and challenges posed by strengthening the metadata creation workflow for disseminating learning objects of the CGIAR.
Open access: changing global science publishing.

PubMed

Gasparyan, Armen Yuri; Ayvazyan, Lilit; Kitas, George D

2013-08-01

The article reflects on open access as a strategy of changing the quality of science communication globally. Successful examples of open-access journals are presented to highlight implications of archiving in open digital repositories for the quality and citability of research output. Advantages and downsides of gold, green, and hybrid models of open access operating in diverse scientific environments are described. It is assumed that open access is a global trend which influences the workflow in scholarly journals, changing their quality, credibility, and indexability.
The Tropical and Subtropical Germplasm Repositories of The National Germplasm System

USDA-ARS?s Scientific Manuscript database

Germplasm collections are viewed as a source of genetic diversity to support crop improvement and agricultural research, and germplasm conservation efforts. The United States Department of Agriculture's National Plant Germplasm Repository System (NPGS) is responsible for administering plant genetic ...

Studying the laws of software evolution in a long-lived FLOSS project.

PubMed

Gonzalez-Barahona, Jesus M; Robles, Gregorio; Herraiz, Israel; Ortega, Felipe

2014-07-01

Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking all their code changes for periods of more than 15 years. This paper examines in detail the evolution of one of such projects, glibc, with the main aim of understanding how it evolved and how it matched Lehman's laws of software evolution. As a result, we have developed a methodology for studying the evolution of such long-lived projects based on the information in their source code management repository, described in detail several aspects of the history of glibc, including some activity and size metrics, and found how some of the laws of software evolution may not hold in this case. © 2013 The Authors. Journal of Software: Evolution and Process published by John Wiley & Sons Ltd.
Studying the laws of software evolution in a long-lived FLOSS project

PubMed Central

Gonzalez-Barahona, Jesus M; Robles, Gregorio; Herraiz, Israel; Ortega, Felipe

2014-01-01

Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking all their code changes for periods of more than 15 years. This paper examines in detail the evolution of one of such projects, glibc, with the main aim of understanding how it evolved and how it matched Lehman's laws of software evolution. As a result, we have developed a methodology for studying the evolution of such long-lived projects based on the information in their source code management repository, described in detail several aspects of the history of glibc, including some activity and size metrics, and found how some of the laws of software evolution may not hold in this case. © 2013 The Authors. Journal of Software: Evolution and Process published by John Wiley & Sons Ltd. PMID:25893093
Surface Model and Tomographic Archive of Fossil Primate and Other Mammal Holotype and Paratype Specimens of the Ditsong National Museum of Natural History, Pretoria, South Africa.

PubMed

Adams, Justin W; Olah, Angela; McCurry, Matthew R; Potze, Stephany

2015-01-01

Nearly a century of paleontological excavation and analysis from the cave deposits of the Cradle of Humankind UNESCO World Heritage Site in northeastern South Africa underlies much of our understanding of the evolutionary history of hominins, other primates and other mammal lineages in the late Pliocene and early Pleistocene of Africa. As one of few designated fossil repositories, the Plio-Pleistocene Palaeontology Section of the Ditsong National Museum of Natural History (DNMNH; the former Transvaal Museum) curates much of the mammalian faunas recovered from the fossil-rich deposits of major South African hominin-bearing localities, including the holotype and paratype specimens of many primate, carnivore, and other mammal species (Orders Primates, Carnivora, Artiodactyla, Eulipotyphla, Hyracoidea, Lagomorpha, Perissodactyla, and Proboscidea). Here we describe an open-access digital archive of high-resolution, full-color three-dimensional (3D) surface meshes of all 89 non-hominin holotype, paratype and significant mammalian specimens curated in the Plio-Pleistocene Section vault. Surface meshes were generated using a commercial surface scanner (Artec Spider, Artec Group, Luxembourg), are provided in formats that can be opened in both open-source and commercial software, and can be readily downloaded either via an online data repository (MorphoSource) or via direct request from the DNMNH. In addition to providing surface meshes for each specimen, we also provide tomographic data (both computerized tomography [CT] and microfocus [microCT]) for a subset of these fossil specimens. This archive of the DNMNH Plio-Pleistocene collections represents the first research-quality 3D datasets of African mammal fossils to be made openly available. This simultaneously provides the paleontological community with essential baseline information (e.g., updated listing and 3D record of specimens in their current state of preservation) and serves as a single resource of high-resolution digital data that improves collections accessibility, reduces unnecessary duplication of efforts by researchers, and encourages ongoing imaging-based paleobiological research across a range of South African non-hominin fossil faunas. Because the types, paratypes, and key specimens include globally-distributed mammal taxa, this digital archive not only provides 3D morphological data on taxa fundamental to Neogene and Quaternary South African palaeontology, but also lineages critical to research on African, other Old World, and New World paleocommunities. With such a broader impact of the DNMNH 3D data, we hope that establishing open access to this digital archive will encourage other researchers and institutions to provide similar resources that increase accessibility to paleontological collections and support advanced paleobiological analyses.
ImgLib2--generic image processing in Java.

PubMed

Pietzsch, Tobias; Preibisch, Stephan; Tomancák, Pavel; Saalfeld, Stephan

2012-11-15

ImgLib2 is an open-source Java library for n-dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib. Supplementary data are available at Bioinformatics Online. saalfeld@mpi-cbg.de
The Victor C++ library for protein representation and advanced manipulation.

PubMed

Hirsh, Layla; Piovesan, Damiano; Giollo, Manuel; Ferrari, Carlo; Tosatto, Silvio C E

2015-04-01

Protein sequence and structure representation and manipulation require dedicated software libraries to support methods of increasing complexity. Here, we describe the VIrtual Constrution TOol for pRoteins (Victor) C++ library, an open source platform dedicated to enabling inexperienced users to develop advanced tools and gathering contributions from the community. The provided application examples cover statistical energy potentials, profile-profile sequence alignments and ab initio loop modeling. Victor was used over the last 15 years in several publications and optimized for efficiency. It is provided as a GitHub repository with source files and unit tests, plus extensive online documentation, including a Wiki with help files and tutorials, examples and Doxygen documentation. The C++ library and online documentation, distributed under a GPL license are available from URL: http://protein.bio.unipd.it/victor/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Can Data Repositories Help Find Effective Treatments for Complex Diseases?

PubMed Central

Farber, Gregory K.

2016-01-01

There are many challenges to developing treatments for complex diseases. This review explores the question of whether it is possible to imagine a data repository that would increase the pace of understanding complex diseases sufficiently well to facilitate the development of effective treatments. First, consideration is given to the amount of data that might be needed for such a data repository and whether the existing data storage infrastructure is enough. Several successful data repositories are then examined to see if they have common characteristics. An area of science where unsuccessful attempts to develop a data infrastructure is then described to see what lessons could be learned for a data repository devoted to complex disease. Then, a variety of issues related to sharing data are discussed. In some of these areas, it is reasonably clear how to move forward. In other areas, there are significant open questions that need to be addressed by all data repositories. Using that baseline information, the question of whether data archives can be effective in understanding a complex disease is explored. The major goal of such a data archive is likely to be identifying biomarkers that define sub-populations of the disease. PMID:27018167
Can data repositories help find effective treatments for complex diseases?

PubMed

Farber, Gregory K

2017-05-01

There are many challenges to developing treatments for complex diseases. This review explores the question of whether it is possible to imagine a data repository that would increase the pace of understanding complex diseases sufficiently well to facilitate the development of effective treatments. First, consideration is given to the amount of data that might be needed for such a data repository and whether the existing data storage infrastructure is enough. Several successful data repositories are then examined to see if they have common characteristics. An area of science where unsuccessful attempts to develop a data infrastructure is then described to see what lessons could be learned for a data repository devoted to complex disease. Then, a variety of issues related to sharing data are discussed. In some of these areas, it is reasonably clear how to move forward. In other areas, there are significant open questions that need to be addressed by all data repositories. Using that baseline information, the question of whether data archives can be effective in understanding a complex disease is explored. The major goal of such a data archive is likely to be identifying biomarkers that define sub-populations of the disease. Published by Elsevier Ltd.
Rolling Deck to Repository (R2R): Standards and Semantics for Open Access to Research Data

NASA Astrophysics Data System (ADS)

Arko, Robert; Carbotte, Suzanne; Chandler, Cynthia; Smith, Shawn; Stocks, Karen

2015-04-01

In recent years, a growing number of funding agencies and professional societies have issued policies calling for open access to research data. The Rolling Deck to Repository (R2R) program is working to ensure open access to the environmental sensor data routinely acquired by the U.S. academic research fleet. Currently 25 vessels deliver 7 terabytes of data to R2R each year, acquired from a suite of geophysical, oceanographic, meteorological, and navigational sensors on over 400 cruises worldwide. R2R is working to ensure these data are preserved in trusted repositories, discoverable via standard protocols, and adequately documented for reuse. R2R maintains a master catalog of cruises for the U.S. academic research fleet, currently holding essential documentation for over 3,800 expeditions including vessel and cruise identifiers, start/end dates and ports, project titles and funding awards, science parties, dataset inventories with instrument types and file formats, data quality assessments, and links to related content at other repositories. A Digital Object Identifier (DOI) is published for 1) each cruise, 2) each original field sensor dataset, 3) each post-field data product such as quality-controlled shiptrack navigation produced by the R2R program, and 4) each document such as a cruise report submitted by the science party. Scientists are linked to personal identifiers, such as the Open Researcher and Contributor ID (ORCID), where known. Using standard global identifiers such as DOIs and ORCIDs facilitates linking with journal publications and generation of citation metrics. Since its inception, the R2R program has worked in close collaboration with other data repositories in the development of shared semantics for oceanographic research. The R2R cruise catalog uses community-standard terms and definitions hosted by the NERC Vocabulary Server, and publishes ISO metadata records for each cruise that use community-standard profiles developed with the NOAA Data Centers and the EU SeaDataNet project. R2R is a partner in the Ocean Data Interoperability Platform (ODIP), working to strengthen links among regional and national data systems, as well as a lead partner in the EarthCube "GeoLink" project, developing a standard set of ontology design patterns for publishing research data using Semantic Web protocols.
BioAcoustica: a free and open repository and analysis platform for bioacoustics

PubMed Central

Baker, Edward; Price, Ben W.; Rycroft, S. D.; Smith, Vincent S.

2015-01-01

We describe an online open repository and analysis platform, BioAcoustica (http://bio.acousti.ca), for recordings of wildlife sounds. Recordings can be annotated using a crowdsourced approach, allowing voice introductions and sections with extraneous noise to be removed from analyses. This system is based on the Scratchpads virtual research environment, the BioVeL portal and the Taverna workflow management tool, which allows for analysis of recordings using a grid computing service. At present the analyses include spectrograms, oscillograms and dominant frequency analysis. Further analyses can be integrated to meet the needs of specific researchers or projects. Researchers can upload and annotate their recordings to supplement traditional publication. Database URL: http://bio.acousti.ca PMID:26055102
A Novel Multiple Choice Question Generation Strategy: Alternative Uses for Controlled Vocabulary Thesauri in Biomedical-Sciences Education.

PubMed

Lopetegui, Marcelo A; Lara, Barbara A; Yen, Po-Yin; Çatalyürek, Ümit V; Payne, Philip R O

2015-01-01

Multiple choice questions play an important role in training and evaluating biomedical science students. However, the resource intensive nature of question generation limits their open availability, reducing their contribution to evaluation purposes mainly. Although applied-knowledge questions require a complex formulation process, the creation of concrete-knowledge questions (i.e., definitions, associations) could be assisted by the use of informatics methods. We envisioned a novel and simple algorithm that exploits validated knowledge repositories and generates concrete-knowledge questions by leveraging concepts' relationships. In this manuscript we present the development and validation of a prototype which successfully produced meaningful concrete-knowledge questions, opening new applications for existing knowledge repositories, potentially benefiting students of all biomedical sciences disciplines.
A collection of open source applications for mass spectrometry data mining.

PubMed

Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

2014-10-01

We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Metadata management for high content screening in OMERO

PubMed Central

Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K.; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J.; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R.

2016-01-01

High content screening (HCS) experiments create a classic data management challenge—multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of “final” results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. PMID:26476368
Metadata management for high content screening in OMERO.

PubMed

Li, Simon; Besson, Sébastien; Blackburn, Colin; Carroll, Mark; Ferguson, Richard K; Flynn, Helen; Gillen, Kenneth; Leigh, Roger; Lindner, Dominik; Linkert, Melissa; Moore, William J; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Allan, Chris; Burel, Jean-Marie; Moore, Josh; Swedlow, Jason R

2016-03-01

High content screening (HCS) experiments create a classic data management challenge-multiple, large sets of heterogeneous structured and unstructured data, that must be integrated and linked to produce a set of "final" results. These different data include images, reagents, protocols, analytic output, and phenotypes, all of which must be stored, linked and made accessible for users, scientists, collaborators and where appropriate the wider community. The OME Consortium has built several open source tools for managing, linking and sharing these different types of data. The OME Data Model is a metadata specification that supports the image data and metadata recorded in HCS experiments. Bio-Formats is a Java library that reads recorded image data and metadata and includes support for several HCS screening systems. OMERO is an enterprise data management application that integrates image data, experimental and analytic metadata and makes them accessible for visualization, mining, sharing and downstream analysis. We discuss how Bio-Formats and OMERO handle these different data types, and how they can be used to integrate, link and share HCS experiments in facilities and public data repositories. OME specifications and software are open source and are available at https://www.openmicroscopy.org. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Making Higher Education More Affordable, One Course Reading at a Time: Academic Libraries as Key Advocates for Open Access Textbooks and Educational Resources

ERIC Educational Resources Information Center

Okamoto, Karen

2013-01-01

Open access textbooks (OATs) and educational resources (OERs) are being lauded as a viable alternative to costly print textbooks. Some academic libraries are joining the OER movement by creating guides to open repositories. Others are promoting OATs and OERs, reviewing them, and even helping to create them. This article analyzes how academic…
Basic repository source term and data sheet report: Lavender Canyon

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

1988-01-01

This report is one of a series describing studies undertaken in support of the US Department of Energy Civilian Radioactive Waste Management (CRWM) Program. This study contains the derivation of values for environmental source terms and resources consumed for a CRWM repository. Estimates include heavy construction equipment; support equipment; shaft-sinking equipment; transportation equipment; and consumption of fuel, water, electricity, and natural gas. Data are presented for construction and operation at an assumed site in Lavender Canyon, Utah. 3 refs; 6 tabs.
Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets

PubMed Central

McKinney, Bill; Meyer, Peter A.; Crosas, Mercè; Sliz, Piotr

2016-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension—functionality supporting preservation of filesystem structure within Dataverse—which is essential for both in-place computation and supporting non-http data transfers. PMID:27862010
Rolling Deck to Repository (R2R): Linking and Integrating Data for Oceanographic Research

NASA Astrophysics Data System (ADS)

Arko, R. A.; Chandler, C. L.; Clark, P. D.; Shepherd, A.; Moore, C.

2012-12-01

The Rolling Deck to Repository (R2R) program is developing infrastructure to ensure the underway sensor data from NSF-supported oceanographic research vessels are routinely and consistently documented, preserved in long-term archives, and disseminated to the science community. We have published the entire R2R Catalog as a Linked Data collection, making it easily accessible to encourage linking and integration with data at other repositories. We are developing the R2R Linked Data collection with specific goals in mind: 1.) We facilitate data access and reuse by providing the richest possible collection of resources to describe vessels, cruises, instruments, and datasets from the U.S. academic fleet, including data quality assessment results and clean trackline navigation. We are leveraging or adopting existing community-standard concepts and vocabularies, particularly concepts from the Biological and Chemical Oceanography Data Management Office (BCO-DMO) ontology and terms from the pan-European SeaDataNet vocabularies, and continually re-publish resources as new concepts and terms are mapped. 2.) We facilitate data citation through the entire data lifecycle from field acquisition to shoreside archiving to (ultimately) global syntheses and journal articles. We are implementing globally unique and persistent identifiers at the collection, dataset, and granule levels, and encoding these citable identifiers directly into the Linked Data resources. 3.) We facilitate linking and integration with other repositories that publish Linked Data collections for the U.S. academic fleet, such as BCO-DMO and the Index to Marine and Lacustrine Geological Samples (IMLGS). We are initially mapping datasets at the resource level, and plan to eventually implement rule-based mapping at the concept level. We work collaboratively with partner repositories to develop best practices for URI patterns and consensus on shared vocabularies. The R2R Linked Data collection is implemented as a lightweight "virtual RDF graph" generated on-the-fly from our SQL database using the D2RQ (http://d2rq.org) package. In addition to the default SPARQL endpoint for programmatic access, we are developing a Web-based interface from open-source software components that offers user-friendly browse and search.
JGrass-NewAge hydrological system: an open-source platform for the replicability of science.

NASA Astrophysics Data System (ADS)

Bancheri, Marialaura; Serafin, Francesco; Formetta, Giuseppe; Rigon, Riccardo; David, Olaf

2017-04-01

JGrass-NewAge is an open source semi-distributed hydrological modelling system. It is based on the object modelling framework (OMS version 3), on the JGrasstools and on the Geotools. OMS3 allows to create independent packages of software which can be connected at run-time in a working modelling solution. These components are available as library/dependency or as repository to fork in order to add further features. Different tools are adopted to make easier the integration, the interoperability and the use of each package. Most of the components are Gradle integrated, since it represents the state-of-art of the building systems, especially for Java projects. The continuous integration is a further layer between local source code (client-side) and remote repository (server-side) and ensures the building and the testing of the source code at each commit. Finally, the use of Zenodo makes the code hosted in GitHub unique, citable and traceable, with a defined DOI. Following the previous standards, each part of the hydrological cycle is implemented in JGrass-NewAge as a component that can be selected, adopted, and connected to obtain a user "customized" hydrological model. A variety of modelling solutions are possible, allowing a complete hydrological analysis. Moreover, thanks to the JGrasstools and the Geotools, the visualization of the data and of the results using a selected GIS is possible. After the geomorphological analysis of the watershed, the spatial interpolation of the meteorological inputs can be performed using both deterministic (IDW) and geostatistic (Kriging) algorithms. For the radiation balance, the shortwave and longwave radiation can be estimated, which are, in turn, inputs for the simulation of the evapotranspiration, according to Priestly-Taylor and Penman-Monteith formulas. Three degree-day models are implemented for the snow melting and SWE. The runoff production can be simulated using two different components, "Adige" and "Embedded Reservoirs". The travel time theory has recently been integrated for a coupled analysis of the solute transport. Eventually, each component can be connected to the different calibration tools such as LUCA and PSO. Further information about the actual implementation can be found at (https://github.com/geoframecomponents), while the OMS projects with the examples, data and results are available at (https://github.com/GEOframeOMSProjects).
Audit and Certification Process for Science Data Digital Repositories

NASA Astrophysics Data System (ADS)

Hughes, J. S.; Giaretta, D.; Ambacher, B.; Ashley, K.; Conrad, M.; Downs, R. R.; Garrett, J.; Guercio, M.; Lambert, S.; Longstreth, T.; Sawyer, D. M.; Sierman, B.; Tibbo, H.; Waltz, M.

2011-12-01

Science data digital repositories are entrusted to ensure that a science community's data are available and useful to users both today and in the future. Part of the challenge in meeting this responsibility is identifying the standards, policies and procedures required to accomplish effective data preservation. Subsequently a repository should be evaluated on whether or not they are effective in their data preservation efforts. This poster will outline the process by which digital repositories are being formally evaluated in terms of their ability to preserve the digitally encoded information with which they have been entrusted. The ISO standards on which this is based will be identified and the relationship of these standards to the Open Archive Information System (OAIS) reference model will be shown. Six test audits have been conducted with three repositories in Europe and three in the USA. Some of the major lessons learned from these test audits will be briefly described. An assessment of the possible impact of this type of audit and certification on the practice of preserving digital information will also be provided.
Aggregating Data for Computational Toxicology Applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System

PubMed Central

Judson, Richard S.; Martin, Matthew T.; Egeghy, Peter; Gangwal, Sumit; Reif, David M.; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A.; Richard, Ann M.

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases. PMID:22408426

Aggregating data for computational toxicology applications: The U.S. Environmental Protection Agency (EPA) Aggregated Computational Toxicology Resource (ACToR) System.

PubMed

Judson, Richard S; Martin, Matthew T; Egeghy, Peter; Gangwal, Sumit; Reif, David M; Kothiya, Parth; Wolf, Maritja; Cathey, Tommy; Transue, Thomas; Smith, Doris; Vail, James; Frame, Alicia; Mosher, Shad; Cohen Hubal, Elaine A; Richard, Ann M

2012-01-01

Computational toxicology combines data from high-throughput test methods, chemical structure analyses and other biological domains (e.g., genes, proteins, cells, tissues) with the goals of predicting and understanding the underlying mechanistic causes of chemical toxicity and for predicting toxicity of new chemicals and products. A key feature of such approaches is their reliance on knowledge extracted from large collections of data and data sets in computable formats. The U.S. Environmental Protection Agency (EPA) has developed a large data resource called ACToR (Aggregated Computational Toxicology Resource) to support these data-intensive efforts. ACToR comprises four main repositories: core ACToR (chemical identifiers and structures, and summary data on hazard, exposure, use, and other domains), ToxRefDB (Toxicity Reference Database, a compilation of detailed in vivo toxicity data from guideline studies), ExpoCastDB (detailed human exposure data from observational studies of selected chemicals), and ToxCastDB (data from high-throughput screening programs, including links to underlying biological information related to genes and pathways). The EPA DSSTox (Distributed Structure-Searchable Toxicity) program provides expert-reviewed chemical structures and associated information for these and other high-interest public inventories. Overall, the ACToR system contains information on about 400,000 chemicals from 1100 different sources. The entire system is built using open source tools and is freely available to download. This review describes the organization of the data repository and provides selected examples of use cases.
An Open Catalog for Supernova Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guillochon, James; Parrent, Jerod; Kelley, Luke Zoltan

We present the Open Supernova Catalog , an online collection of observations and metadata for presently 36,000+ supernovae and related candidates. The catalog is freely available on the web (https://sne.space), with its main interface having been designed to be a user-friendly, rapidly searchable table accessible on desktop and mobile devices. In addition to the primary catalog table containing supernova metadata, an individual page is generated for each supernova, which displays its available metadata, light curves, and spectra spanning X-ray to radio frequencies. The data presented in the catalog is automatically rebuilt on a daily basis and is constructed by parsingmore » several dozen sources, including the data presented in the supernova literature and from secondary sources such as other web-based catalogs. Individual supernova data is stored in the hierarchical, human- and machine-readable JSON format, with the entirety of each supernova’s data being contained within a single JSON file bearing its name. The setup we present here, which is based on open-source software maintained via git repositories hosted on github, enables anyone to download the entirety of the supernova data set to their home computer in minutes, and to make contributions of their own data back to the catalog via git. As the supernova data set continues to grow, especially in the upcoming era of all-sky synoptic telescopes, which will increase the total number of events by orders of magnitude, we hope that the catalog we have designed will be a valuable tool for the community to analyze both historical and contemporary supernovae.« less
An Open Catalog for Supernova Data

NASA Astrophysics Data System (ADS)

Guillochon, James; Parrent, Jerod; Kelley, Luke Zoltan; Margutti, Raffaella

2017-01-01

We present the Open Supernova Catalog, an online collection of observations and metadata for presently 36,000+ supernovae and related candidates. The catalog is freely available on the web (https://sne.space), with its main interface having been designed to be a user-friendly, rapidly searchable table accessible on desktop and mobile devices. In addition to the primary catalog table containing supernova metadata, an individual page is generated for each supernova, which displays its available metadata, light curves, and spectra spanning X-ray to radio frequencies. The data presented in the catalog is automatically rebuilt on a daily basis and is constructed by parsing several dozen sources, including the data presented in the supernova literature and from secondary sources such as other web-based catalogs. Individual supernova data is stored in the hierarchical, human- and machine-readable JSON format, with the entirety of each supernova’s data being contained within a single JSON file bearing its name. The setup we present here, which is based on open-source software maintained via git repositories hosted on github, enables anyone to download the entirety of the supernova data set to their home computer in minutes, and to make contributions of their own data back to the catalog via git. As the supernova data set continues to grow, especially in the upcoming era of all-sky synoptic telescopes, which will increase the total number of events by orders of magnitude, we hope that the catalog we have designed will be a valuable tool for the community to analyze both historical and contemporary supernovae.
Transmission from theory to practice: Experiences using open-source code development and a virtual short course to increase the adoption of new theoretical approaches

NASA Astrophysics Data System (ADS)

Harman, C. J.

2015-12-01

Even amongst the academic community, new theoretical tools can remain underutilized due to the investment of time and resources required to understand and implement them. This surely limits the frequency that new theory is rigorously tested against data by scientists outside the group that developed it, and limits the impact that new tools could have on the advancement of science. Reducing the barriers to adoption through online education and open-source code can bridge the gap between theory and data, forging new collaborations, and advancing science. A pilot venture aimed at increasing the adoption of a new theory of time-variable transit time distributions was begun in July 2015 as a collaboration between Johns Hopkins University and The Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI). There were four main components to the venture: a public online seminar covering the theory, an open source code repository, a virtual short course designed to help participants apply the theory to their data, and an online forum to maintain discussion and build a community of users. 18 participants were selected for the non-public components based on their responses in an application, and were asked to fill out a course evaluation at the end of the short course, and again several months later. These evaluations, along with participation in the forum and on-going contact with the organizer suggest strengths and weaknesses in this combination of components to assist participants in adopting new tools.
The Future of ECHO: Evaluating Open Source Possibilities

NASA Astrophysics Data System (ADS)

Pilone, D.; Gilman, J.; Baynes, K.; Mitchell, A. E.

2012-12-01

NASA's Earth Observing System ClearingHOuse (ECHO) is a format agnostic metadata repository supporting over 3000 collections and 100M science granules. ECHO exposes FTP and RESTful Data Ingest APIs in addition to both SOAP and RESTful search and order capabilities. Built on top of ECHO is a human facing search and order web application named Reverb. ECHO processes hundreds of orders, tens of thousands of searches, and 1-2M ingest actions each week. As ECHO's holdings, metadata format support, and visibility have increased, the ECHO team has received requests by non-NASA entities for copies of ECHO that can be run locally against their data holdings. ESDIS and the ECHO Team have begun investigations into various deployment and Open Sourcing models that can balance the real constraints faced by the ECHO project with the benefits of providing ECHO capabilities to a broader set of users and providers. This talk will discuss several release and Open Source models being investigated by the ECHO team along with the impacts those models are expected to have on the project. We discuss: - Addressing complex deployment or setup issues for potential users - Models of vetting code contributions - Balancing external (public) user requests versus our primary partners - Preparing project code for public release, including navigating licensing issues related to leveraged libraries - Dealing with non-free project dependencies such as commercial databases - Dealing with sensitive aspects of project code such as database passwords, authentication approaches, security through obscurity, etc. - Ongoing support for the released code including increased testing demands, bug fixes, security fixes, and new features.
MaizeGDB: Global support for maize research through open access information [abstract

USDA-ARS?s Scientific Manuscript database

MaizeGDB is the open-access global repository for maize genetic and genomic information – from single genes that determine nutritional quality to whole genome-scale data for complex traits including yield and drought tolerance. The data and tools at MaizeGDB enable researchers from Ethiopia to Ghan...
A Review of Open Access Self-Archiving Mandate Policies

ERIC Educational Resources Information Center

Xia, Jingfeng; Gilchrist, Sarah B.; Smith, Nathaniel X. P.; Kingery, Justin A.; Radecki, Jennifer R.; Wilhelm, Marcia L.; Harrison, Keith C.; Ashby, Michael L.; Mahn, Alyson J.

2012-01-01

This article reviews the history of open access (OA) policies and examines the current status of mandate policy implementations. It finds that hundreds of policies have been proposed and adopted at various organizational levels and many of them have shown a positive effect on the rate of repository content accumulation. However, it also detects…
A Federated Reference Structure for Open Informational Ecosystems

ERIC Educational Resources Information Center

Heinen, Richard; Kerres, Michael; Scharnberg, Gianna; Blees, Ingo; Rittberger, Marc

2016-01-01

The paper describes the concept of a federated ecosystem for Open Educational Resources (OER) in the German education system. Here, a variety of OER repositories (ROER) (Muuß-Merholz & Schaumburg, 2014) and reference platforms have been established in the recent past. In order to develop this ecosystem, not only are metadata standards…
Conceptual Framework for Parametrically Measuring the Desirability of Open Educational Resources Using D-Index

ERIC Educational Resources Information Center

Abeywardena, Ishan Sudeera; Tham, Choy Yoong; Raviraja, S.

2012-01-01

Open educational resources (OER) are a global phenomenon that is fast gaining credibility in many academic circles as a possible solution for bridging the knowledge divide. With increased funding and advocacy from governmental and nongovernmental organisations paired with generous philanthropy, many OER repositories, which host a vast array of…
Scientific Journal Publishing: Yearly Volume and Open Access Availability

ERIC Educational Resources Information Center

Bjork, Bo-Christer; Roos, Annikki; Lauri, Mari

2009-01-01

Introduction: We estimate the total yearly volume of peer-reviewed scientific journal articles published world-wide as well as the share of these articles available openly on the Web either directly or as copies in e-print repositories. Method: We rely on data from two commercial databases (ISI and Ulrich's Periodicals Directory) supplemented by…
Multimedia Open Educational Resources in Mathematics for High School Students with Learning Disabilities

ERIC Educational Resources Information Center

Park, Sanghoon; McLeod, Kenneth

2018-01-01

Open Educational Resources (OER) can offer educators the necessary flexibility for tailoring educational resources to better fit their educational goals. Although the number of OER repositories is growing fast, few studies have been conducted to empirically test the effectiveness of OER integration in the classroom. Furthermore, very little is…
Harvesting Alternative Credit Transfer Students: Redefining Selectivity in Your Online Learning Program Enrollment Leads

ERIC Educational Resources Information Center

Corlett, Bradly

2014-01-01

Several recent issues and trends in online education have resulted in consolidation of efforts for Massive Open Online Courses (MOOCs), increased Open Educational Resources (OER) in the form of asynchronous course repositories, with noticeable increases in governance and policy amplification. These emerging enrollment trends in alternative online…
Cost Implications of an Interim Storage Facility in the Waste Management System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jarrell, Joshua J.; Joseph, III, Robert Anthony; Howard, Rob L

2016-09-01

This report provides an evaluation of the cost implications of incorporating a consolidated interim storage facility (ISF) into the waste management system (WMS). Specifically, the impacts of the timing of opening an ISF relative to opening a repository were analyzed to understand the potential effects on total system costs.
PGP repository: a plant phenomics and genomics data publication infrastructure.

PubMed

Arend, Daniel; Junker, Astrid; Scholz, Uwe; Schüler, Danuta; Wylie, Juliane; Lange, Matthias

2016-01-01

Plant genomics and phenomics represents the most promising tools for accelerating yield gains and overcoming emerging crop productivity bottlenecks. However, accessing this wealth of plant diversity requires the characterization of this material using state-of-the-art genomic, phenomic and molecular technologies and the release of subsequent research data via a long-term stable, open-access portal. Although several international consortia and public resource centres offer services for plant research data management, valuable digital assets remains unpublished and thus inaccessible to the scientific community. Recently, the Leibniz Institute of Plant Genetics and Crop Plant Research and the German Plant Phenotyping Network have jointly initiated the Plant Genomics and Phenomics Research Data Repository (PGP) as infrastructure to comprehensively publish plant research data. This covers in particular cross-domain datasets that are not being published in central repositories because of its volume or unsupported data scope, like image collections from plant phenotyping and microscopy, unfinished genomes, genotyping data, visualizations of morphological plant models, data from mass spectrometry as well as software and documents.The repository is hosted at Leibniz Institute of Plant Genetics and Crop Plant Research using e!DAL as software infrastructure and a Hierarchical Storage Management System as data archival backend. A novel developed data submission tool was made available for the consortium that features a high level of automation to lower the barriers of data publication. After an internal review process, data are published as citable digital object identifiers and a core set of technical metadata is registered at DataCite. The used e!DAL-embedded Web frontend generates for each dataset a landing page and supports an interactive exploration. PGP is registered as research data repository at BioSharing.org, re3data.org and OpenAIRE as valid EU Horizon 2020 open data archive. Above features, the programmatic interface and the support of standard metadata formats, enable PGP to fulfil the FAIR data principles-findable, accessible, interoperable, reusable.Database URL:http://edal.ipk-gatersleben.de/repos/pgp/. © The Author(s) 2016. Published by Oxford University Press.
Natural analogues for processes affecting disposal of high-level radioactive waste in the vadose zone

NASA Astrophysics Data System (ADS)

Stuckless, J. S.

2003-04-01

Natural analogues can contribute to understanding and predicting the performance of subsystems and processes affecting a mined geologic repository for high-level radioactive waste in several ways. Most importantly, analogues provide tests for various aspects of systems of a repository at dimensional scales and time spans that cannot be attained by experimental study. In addition, they provide a means for the general public to judge the predicted performance of a potential high-level nuclear waste repository in familiar terms such that the average person can assess the anticipated long-term performance and other scientific conclusions. Hydrologists working on the Yucca Mountain Project (currently the U.S. Department of Energy's Office of Repository Development) have modeled the flow of water through the vadose zone at Yucca Mountain, Nevada and particularly the interaction of vadose-zone water with mined openings. Analogues from both natural and anthropogenic examples confirm the prediction that most of the water moving through the vadose zone will move through the host rock and around tunnels. This can be seen both quantitatively where direct comparison between seepage and net infiltration has been made and qualitatively by the excellent degree of preservation of archaeologic artifacts in underground openings. The latter include Paleolithic cave paintings in southwestern Europe, murals and artifacts in Egyptian tombs, painted subterranean Buddhist temples in India and China, and painted underground churches in Cappadocia, Turkey. Natural analogues also suggest that this diversion mechanism is more effective in porous media than in fractured media. Observations from natural analogues are also consistent with the modeled decrease in the percentage of infiltration that becomes seepage with a decrease in amount of infiltration. Finally, analogues, such as tombs that have ben partially filled by mud flows, suggest that the same capillary forces that keep water in the rock around underground openings will draw water towards buried waste packages if they are encased in backfill. Analogue work in support of the U.S. repository program continues in the U.S. Geological Survey, in cooperation with the U.S. Department of Energy.
Health care transformation through collaboration on open-source informatics projects: integrating a medical applications platform, research data repository, and patient summarization.

PubMed

Klann, Jeffrey G; McCoy, Allison B; Wright, Adam; Wattanasin, Nich; Sittig, Dean F; Murphy, Shawn N

2013-05-30

The Strategic Health IT Advanced Research Projects (SHARP) program seeks to conquer well-understood challenges in medical informatics through breakthrough research. Two SHARP centers have found alignment in their methodological needs: (1) members of the National Center for Cognitive Informatics and Decision-making (NCCD) have developed knowledge bases to support problem-oriented summarizations of patient data, and (2) Substitutable Medical Apps, Reusable Technologies (SMART), which is a platform for reusable medical apps that can run on participating platforms connected to various electronic health records (EHR). Combining the work of these two centers will ensure wide dissemination of new methods for synthesized views of patient data. Informatics for Integrating Biology and the Bedside (i2b2) is an NIH-funded clinical research data repository platform in use at over 100 sites worldwide. By also working with a co-occurring initiative to SMART-enabling i2b2, we can confidently write one app that can be used extremely broadly. Our goal was to facilitate development of intuitive, problem-oriented views of the patient record using NCCD knowledge bases that would run in any EHR. To do this, we developed a collaboration between the two SHARPs and an NIH center, i2b2. First, we implemented collaborative tools to connect researchers at three institutions. Next, we developed a patient summarization app using the SMART platform and a previously validated NCCD problem-medication linkage knowledge base derived from the National Drug File-Reference Terminology (NDF-RT). Finally, to SMART-enable i2b2, we implemented two new Web service "cells" that expose the SMART application programming interface (API), and we made changes to the Web interface of i2b2 to host a "carousel" of SMART apps. We deployed our SMART-based, NDF-RT-derived patient summarization app in this SMART-i2b2 container. It displays a problem-oriented view of medications and presents a line-graph display of laboratory results. This summarization app can be run in any EHR environment that either supports SMART or runs SMART-enabled i2b2. This i2b2 "clinical bridge" demonstrates a pathway for reusable app development that does not require EHR vendors to immediately adopt the SMART API. Apps can be developed in SMART and run by clinicians in the i2b2 repository, reusing clinical data extracted from EHRs. This may encourage the adoption of SMART by supporting SMART app development until EHRs adopt the platform. It also allows a new variety of clinical SMART apps, fueled by the broad aggregation of data types available in research repositories. The app (including its knowledge base) and SMART-i2b2 are open-source and freely available for download.
Mercury- Distributed Metadata Management, Data Discovery and Access System

NASA Astrophysics Data System (ADS)

Palanisamy, Giri; Wilson, Bruce E.; Devarakonda, Ranjeet; Green, James M.

2007-12-01

Mercury is a federated metadata harvesting, search and retrieval tool based on both open source and ORNL- developed software. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. Mercury supports various metadata standards including XML, Z39.50, FGDC, Dublin-Core, Darwin-Core, EML, and ISO-19115 (under development). Mercury provides a single portal to information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data. Mercury supports various projects including: ORNL DAAC, NBII, DADDI, LBA, NARSTO, CDIAC, OCEAN, I3N, IAI, ESIP and ARM. The new Mercury system is based on a Service Oriented Architecture and supports various services such as Thesaurus Service, Gazetteer Web Service and UDDI Directory Services. This system also provides various search services including: RSS, Geo-RSS, OpenSearch, Web Services and Portlets. Other features include: Filtering and dynamic sorting of search results, book-markable search results, save, retrieve, and modify search criteria.
Network of anatomical texts (NAnaTex), an open-source project for visualizing the interaction between anatomical terms.

PubMed

Momota, Ryusuke; Ohtsuka, Aiji

2018-01-01

Anatomy is the science and art of understanding the structure of the body and its components in relation to the functions of the whole-body system. Medicine is based on a deep understanding of anatomy, but quite a few introductory-level learners are overwhelmed by the sheer amount of anatomical terminology that must be understood, so they regard anatomy as a dull and dense subject. To help them learn anatomical terms in a more contextual way, we started a new open-source project, the Network of Anatomical Texts (NAnaTex), which visualizes relationships of body components by integrating text-based anatomical information using Cytoscape, a network visualization software platform. Here, we present a network of bones and muscles produced from literature descriptions. As this network is primarily text-based and does not require any programming knowledge, it is easy to implement new functions or provide extra information by making changes to the original text files. To facilitate collaborations, we deposited the source code files for the network into the GitHub repository ( https://github.com/ryusukemomota/nanatex ) so that anybody can participate in the evolution of the network and use it for their own non-profit purposes. This project should help not only introductory-level learners but also professional medical practitioners, who could use it as a quick reference.
Building Scientific Data's list of recommended data repositories

NASA Astrophysics Data System (ADS)

Hufton, A. L.; Khodiyar, V.; Hrynaszkiewicz, I.

2016-12-01

When Scientific Data launched in 2014 we provided our authors with a list of recommended data repositories to help them identify data hosting options that were likely to meet the journal's requirements. This list has grown in size and scope, and is now a central resource for authors across the Nature-titled journals. It has also been used in the development of data deposition policies and recommended repository lists across Springer Nature and at other publishers. Each new addition to the list is assessed according to a series of criteria that emphasize the stability of the resource, its commitment to principles of open science and its implementation of relevant community standards and reporting guidelines. A preference is expressed for repositories that issue digital object identifiers (DOIs) through the DataCite system and that share data under the Creative Commons CC0 waiver. Scientific Data currently lists fourteen repositories that focus on specific areas within the Earth and environmental sciences, as well as the broad scope repositories, Dryad and figshare. Readers can browse and filter datasets published at the journal by the host repository using ISA-explorer, a demo tool built by the ISA-tools team at Oxford University1. We believe that well-maintained lists like this one help publishers build a network of trust with community data repositories and provide an important complement to more comprehensive data repository indices and more formal certification efforts. In parallel, Scientific Data has also improved its policies to better support submissions from authors using institutional and project-specific repositories, without requiring each to apply for listing individually. Online resources Journal homepage: http://www.nature.com/scientificdata Data repository criteria: http://www.nature.com/sdata/policies/data-policies#repo-criteria Recommended data repositories: http://www.nature.com/sdata/policies/repositories Archived copies of the list: https://dx.doi.org/10.6084/m9.figshare.1434640.v6 Reference Gonzalez-Beltran, A. ISA-explorer: A demo tool for discovering and exploring Scientific Data's ISA-tab metadata. Scientific Data Updates http://blogs.nature.com/scientificdata/2015/12/17/isa-explorer/ (2015).
BIRS – Bioterrorism Information Retrieval System

PubMed Central

Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

2013-01-01

Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. Availability The database is freely available at http://www.bioterrorism.biowaves.org PMID:23390356

Performance Assessments of Generic Nuclear Waste Repositories in Shale

NASA Astrophysics Data System (ADS)

Stein, E. R.; Sevougian, S. D.; Mariner, P. E.; Hammond, G. E.; Frederick, J.

2017-12-01

Simulations of deep geologic disposal of nuclear waste in a generic shale formation showcase Geologic Disposal Safety Assessment (GDSA) Framework, a toolkit for repository performance assessment (PA) whose capabilities include domain discretization (Cubit), multiphysics simulations (PFLOTRAN), uncertainty and sensitivity analysis (Dakota), and visualization (Paraview). GDSA Framework is used to conduct PAs of two generic repositories in shale. The first considers the disposal of 22,000 metric tons heavy metal of commercial spent nuclear fuel. The second considers disposal of defense-related spent nuclear fuel and high level waste. Each PA accounts for the thermal load and radionuclide inventory of applicable waste types, components of the engineered barrier system, and components of the natural barrier system including the host rock shale and underlying and overlying stratigraphic units. Model domains are half-symmetry, gridded with Cubit, and contain between 7 and 22 million grid cells. Grid refinement captures the detail of individual waste packages, emplacement drifts, access drifts, and shafts. Simulations are run in a high performance computing environment on as many as 2048 processes. Equations describing coupled heat and fluid flow and reactive transport are solved with PFLOTRAN, an open-source, massively parallel multiphase flow and reactive transport code. Additional simulated processes include waste package degradation, waste form dissolution, radioactive decay and ingrowth, sorption, solubility, advection, dispersion, and diffusion. Simulations are run to 106 y, and radionuclide concentrations are observed within aquifers at a point approximately 5 km downgradient of the repository. Dakota is used to sample likely ranges of input parameters including waste form and waste package degradation rates and properties of engineered and natural materials to quantify uncertainty in predicted concentrations and sensitivity to input parameters. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. SAND2017- 8305 A
Disposal of disused sealed radiation sources in Boreholes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vicente, R.

2007-07-01

This paper gives a description of the concept of a geological repository for disposal of disused sealed radiation sources (DSRS) under development in the Institute of Energy and Nuclear Research (IPEN), in Brazil. DSRS represent a significant fraction of total activity of radioactive wastes to be managed. Most DSRS are collected and temporarily stored at IPEN. As of 2006, the total collected activity is 800 TBq in 7,508 industrial gauge or radiotherapy sources, 7.2 TBq in about 72,000 Americium-241 sources detached from lightning rods, and about 0,5 GBq in 20,857 sources from smoke detectors. The estimated inventory of sealed sourcesmore » in the country is 2.7 hundred thousand sources with 26 PBq. The proposed repository is designed to receive the total inventory of sealed sources. A description of the pre-disposal facilities at IPEN is also presented. (authors)« less
In Praise of Openness

NASA Astrophysics Data System (ADS)

Arunachalam, S.

2010-10-01

Open access brings greater visibility and impact to the work of scientists as is evidenced in the examples discussed in this paper. Researchers are often reluctant and afraid to deposit their works in Institutional Repositories. However, as is shown here, once they do so, they do not regret it. Open access will shortly become the norm and will be accepted by the vast majority of scientists. Seen through the lens of the philosophy of Bertrand Russell, the moral, economic and philosophical imperatives for open access are indeed strong.
Software Writing Skills for Your Research - Lessons Learned from Workshops in the Geosciences

NASA Astrophysics Data System (ADS)

Hammitzsch, Martin

2016-04-01

Findings presented in scientific papers are based on data and software. Once in a while they come along with data - but not commonly with software. However, the software used to gain findings plays a crucial role in the scientific work. Nevertheless, software is rarely seen publishable. Thus researchers may not reproduce the findings without the software which is in conflict with the principle of reproducibility in sciences. For both, the writing of publishable software and the reproducibility issue, the quality of software is of utmost importance. For many programming scientists the treatment of source code, e.g. with code design, version control, documentation, and testing is associated with additional work that is not covered in the primary research task. This includes the adoption of processes following the software development life cycle. However, the adoption of software engineering rules and best practices has to be recognized and accepted as part of the scientific performance. Most scientists have little incentive to improve code and do not publish code because software engineering habits are rarely practised by researchers or students. Software engineering skills are not passed on to followers as for paper writing skill. Thus it is often felt that the software or code produced is not publishable. The quality of software and its source code has a decisive influence on the quality of research results obtained and their traceability. So establishing best practices from software engineering to serve scientific needs is crucial for the success of scientific software. Even though scientists use existing software and code, i.e., from open source software repositories, only few contribute their code back into the repositories. So writing and opening code for Open Science means that subsequent users are able to run the code, e.g. by the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews. This assumes that scientist learn to write and release code and software as they learn to write and publish papers. Having this in mind, software could be valued and assessed as a contribution to science. But this requires the relevant skills that can be passed to colleagues and followers. Therefore, the GFZ German Research Centre for Geosciences performed three workshops in 2015 to address the passing of software writing skills to young scientists, the next generation of researchers in the Earth, planetary and space sciences. Experiences in running these workshops and the lessons learned will be summarized in this presentation. The workshops have received support and funding by Software Carpentry, a volunteer organization whose goal is to make scientists more productive, and their work more reliable, by teaching them basic computing skills, and by FOSTER (Facilitate Open Science Training for European Research), a two-year, EU-Funded (FP7) project, whose goal to produce a European-wide training programme that will help to incorporate Open Access approaches into existing research methodologies and to integrate Open Science principles and practice in the current research workflow by targeting the young researchers and other stakeholders.
Thermoelastic analysis of spent fuel and high level radioactive waste repositories in salt. A semi-analytical solution. [JUDITH

DOE Office of Scientific and Technical Information (OSTI.GOV)

St. John, C.M.

1977-04-01

An underground repository containing heat generating, High Level Waste or Spent Unreprocessed Fuel may be approximated as a finite number of heat sources distributed across the plane of the repository. The resulting temperature, displacement and stress changes may be calculated using analytical solutions, providing linear thermoelasticity is assumed. This report documents a computer program based on this approach and gives results that form the basis for a comparison between the effects of disposing of High Level Waste and Spent Unreprocessed Fuel.
OpenID connect as a security service in Cloud-based diagnostic imaging systems

NASA Astrophysics Data System (ADS)

Ma, Weina; Sartipi, Kamran; Sharghi, Hassan; Koff, David; Bak, Peter

2015-03-01

The evolution of cloud computing is driving the next generation of diagnostic imaging (DI) systems. Cloud-based DI systems are able to deliver better services to patients without constraining to their own physical facilities. However, privacy and security concerns have been consistently regarded as the major obstacle for adoption of cloud computing by healthcare domains. Furthermore, traditional computing models and interfaces employed by DI systems are not ready for accessing diagnostic images through mobile devices. RESTful is an ideal technology for provisioning both mobile services and cloud computing. OpenID Connect, combining OpenID and OAuth together, is an emerging REST-based federated identity solution. It is one of the most perspective open standards to potentially become the de-facto standard for securing cloud computing and mobile applications, which has ever been regarded as "Kerberos of Cloud". We introduce OpenID Connect as an identity and authentication service in cloud-based DI systems and propose enhancements that allow for incorporating this technology within distributed enterprise environment. The objective of this study is to offer solutions for secure radiology image sharing among DI-r (Diagnostic Imaging Repository) and heterogeneous PACS (Picture Archiving and Communication Systems) as well as mobile clients in the cloud ecosystem. Through using OpenID Connect as an open-source identity and authentication service, deploying DI-r and PACS to private or community clouds should obtain equivalent security level to traditional computing model.
Be Creative, Determined, and Wise: Open Library Publishing and the Global South

ERIC Educational Resources Information Center

Baker, Matthew

2009-01-01

Libraries throughout the world are increasingly involved in the production of scholarly publications. Much of this has been thanks to the growth of open access (OA) publishing in all its forms, from peer-reviewed "gold" journals to "green" self-archiving, and electronic theses and dissertation (ETD) repositories. As a result, more and more of the…
DOE Office of Scientific and Technical Information (OSTI.GOV)

Huff, Kathryn D.

Component level and system level abstraction of detailed computational geologic repository models have resulted in four rapid computational models of hydrologic radionuclide transport at varying levels of detail. Those models are described, as is their implementation in Cyder, a software library of interchangeable radionuclide transport models appropriate for representing natural and engineered barrier components of generic geology repository concepts. A proof of principle demonstration was also conducted in which these models were used to represent the natural and engineered barrier components of a repository concept in a reducing, homogenous, generic geology. This base case demonstrates integration of the Cyder openmore » source library with the Cyclus computational fuel cycle systems analysis platform to facilitate calculation of repository performance metrics with respect to fuel cycle choices. (authors)« less
SFO-Project: The New Generation of Sharable, Editable and Open-Access CFD Tutorials

NASA Astrophysics Data System (ADS)

Javaherchi, Teymour; Javaherchi, Ardeshir; Aliseda, Alberto

2016-11-01

One of the most common approaches to develop a Computational Fluid Dynamic (CFD) simulation for a new case study of interest is to search for the most similar, previously developed and validated CFD simulation among other works. A simple search would result into a pool of written/visual tutorials. However, users should spend significant amount of time and effort to find the most correct, compatible and valid tutorial in this pool and further modify it toward their simulation of interest. SFO is an open-source project with the core idea of saving the above-mentioned time and effort. This is done via documenting/sharing scientific and methodological approaches to develop CFD simulations for a wide spectrum of fundamental and industrial case studies in three different CFD solvers; STAR-CCM +, FLUENT and Open FOAM (SFO). All of the steps and required files of these tutorials are accessible and editable under the common roof of Github (a web-based Git repository hosting service). In this presentation we will present the current library of 20 + developed CFD tutorials, discuss the idea and benefit of using them, their educational values and explain how the next generation of open-access and live resource of CFD tutorials can be built further hand-in-hand within our community.
The Community as a Source of Pragmatic Input for Learners of Italian: The Multimedia Repository LIRA

ERIC Educational Resources Information Center

Zanoni, Greta

2016-01-01

This paper focuses on community participation within the LIRA project--Lingua/Cultura Italiana in Rete per l'Apprendimento (Italian language and culture for online learning). LIRA is a multimedia repository of e-learning materials aiming at recovering, preserving and developing the linguistic, pragmatic and cultural competences of second and third…
Use of Digital Repositories by Chemistry Researchers: Results of a Survey

ERIC Educational Resources Information Center

Polydoratou, Panayiota

2007-01-01

Purpose: This paper aims to present findings from a survey that aimed to identify the issues around the use and linkage of source and output repositories and the chemistry researchers' expectations about their use. Design/methodology/approach: This survey was performed by means of an online questionnaire and structured interviews with academic and…
LittleQuickWarp: an ultrafast image warping tool.

PubMed

Qu, Lei; Peng, Hanchuan

2015-02-01

Warping images into a standard coordinate space is critical for many image computing related tasks. However, for multi-dimensional and high-resolution images, an accurate warping operation itself is often very expensive in terms of computer memory and computational time. For high-throughput image analysis studies such as brain mapping projects, it is desirable to have high performance image warping tools that are compatible with common image analysis pipelines. In this article, we present LittleQuickWarp, a swift and memory efficient tool that boosts 3D image warping performance dramatically and at the same time has high warping quality similar to the widely used thin plate spline (TPS) warping. Compared to the TPS, LittleQuickWarp can improve the warping speed 2-5 times and reduce the memory consumption 6-20 times. We have implemented LittleQuickWarp as an Open Source plug-in program on top of the Vaa3D system (http://vaa3d.org). The source code and a brief tutorial can be found in the Vaa3D plugin source code repository. Copyright © 2014 Elsevier Inc. All rights reserved.
Building and Using Digital Repository Certifications across Science

NASA Astrophysics Data System (ADS)

McIntosh, L.

2017-12-01

When scientific recommendations are made based upon research, the quality and integrity of the data should be rigorous enough to verify claims and in a trusted location. Key to ensuring the transparency and verifiability of research, reproducibility hinges not only on the availability of the documentation, analyses, and data, but the ongoing accessibility and viability of the files and documents, enhanced through a process of curation. The Research Data Alliance (RDA) is an international, community-driven, action-oriented, virtual organization committed to enabling the open sharing of data by building social and technical bridges. Within the RDA, multiple groups are working on consensus-building around the certification of digital repositories across scientific domains. For this section of the panel, we will discuss the work to date on repository certification from this RDA perspective.
Intervertebral reaction force prediction using an enhanced assembly of OpenSim models.

PubMed

Senteler, Marco; Weisse, Bernhard; Rothenfluh, Dominique A; Snedeker, Jess G

2016-01-01

OpenSim offers a valuable approach to investigating otherwise difficult to assess yet important biomechanical parameters such as joint reaction forces. Although the range of available models in the public repository is continually increasing, there currently exists no OpenSim model for the computation of intervertebral joint reactions during flexion and lifting tasks. The current work combines and improves elements of existing models to develop an enhanced model of the upper body and lumbar spine. Models of the upper body with extremities, neck and head were combined with an improved version of a lumbar spine from the model repository. Translational motion was enabled for each lumbar vertebrae with six controllable degrees of freedom. Motion segment stiffness was implemented at lumbar levels and mass properties were assigned throughout the model. Moreover, body coordinate frames of the spine were modified to allow straightforward variation of sagittal alignment and to simplify interpretation of results. Evaluation of model predictions for level L1-L2, L3-L4 and L4-L5 in various postures of forward flexion and moderate lifting (8 kg) revealed an agreement within 10% to experimental studies and model-based computational analyses. However, in an extended posture or during lifting of heavier loads (20 kg), computed joint reactions differed substantially from reported in vivo measures using instrumented implants. We conclude that agreement between the model and available experimental data was good in view of limitations of both the model and the validation datasets. The presented model is useful in that it permits computation of realistic lumbar spine joint reaction forces during flexion and moderate lifting tasks. The model and corresponding documentation are now available in the online OpenSim repository.
Finite element code FENIA verification and application for 3D modelling of thermal state of radioactive waste deep geological repository

NASA Astrophysics Data System (ADS)

Butov, R. A.; Drobyshevsky, N. I.; Moiseenko, E. V.; Tokarev, U. N.

2017-11-01

The verification of the FENIA finite element code on some problems and an example of its application are presented in the paper. The code is being developing for 3D modelling of thermal, mechanical and hydrodynamical (THM) problems related to the functioning of deep geological repositories. Verification of the code for two analytical problems has been performed. The first one is point heat source with exponential heat decrease, the second one - linear heat source with similar behavior. Analytical solutions have been obtained by the authors. The problems have been chosen because they reflect the processes influencing the thermal state of deep geological repository of radioactive waste. Verification was performed for several meshes with different resolution. Good convergence between analytical and numerical solutions was achieved. The application of the FENIA code is illustrated by 3D modelling of thermal state of a prototypic deep geological repository of radioactive waste. The repository is designed for disposal of radioactive waste in a rock at depth of several hundred meters with no intention of later retrieval. Vitrified radioactive waste is placed in the containers, which are placed in vertical boreholes. The residual decay heat of radioactive waste leads to containers, engineered safety barriers and host rock heating. Maximum temperatures and corresponding times of their establishment have been determined.
Expanding Access and Usage of NASA Near Real-Time Imagery and Data

NASA Astrophysics Data System (ADS)

Cechini, M.; Murphy, K. J.; Boller, R. A.; Schmaltz, J. E.; Thompson, C. K.; Huang, T.; McGann, J. M.; Ilavajhala, S.; Alarcon, C.; Roberts, J. T.

2013-12-01

In late 2009, the Land Atmosphere Near-real-time Capability for EOS (LANCE) was created to greatly expand the range of near real-time data products from a variety of Earth Observing System (EOS) instruments. Since that time, NASA's Earth Observing System Data and Information System (EOSDIS) developed the Global Imagery Browse Services (GIBS) to provide highly responsive, scalable, and expandable imagery services that distribute near real-time imagery in an intuitive and geo-referenced format. The GIBS imagery services provide access through standards-based protocols such as the Open Geospatial Consortium (OGC) Web Map Tile Service (WMTS) and standard mapping file formats such as the Keyhole Markup Language (KML). Leveraging these standard mechanisms opens NASA near real-time imagery to a broad landscape of mapping libraries supporting mobile applications. By easily integrating with mobile application development libraries, GIBS makes it possible for NASA imagery to become a reliable and valuable source for end-user applications. Recently, EOSDIS has taken steps to integrate near real-time metadata products into the EOS ClearingHOuse (ECHO) metadata repository. Registration of near real-time metadata allows for near real-time data discovery through ECHO clients. In kind with the near real-time data processing requirements, the ECHO ingest model allows for low-latency metadata insertion and updates. Combining with the ECHO repository, the fast visual access of GIBS imagery can now be linked directly back to the source data file(s). Through the use of discovery standards such as OpenSearch, desktop and mobile applications can connect users to more than just an image. As data services, such as OGC Web Coverage Service, become more prevalent within the EOSDIS system, applications may even be able to connect users from imagery to data values. In addition, the full resolution GIBS imagery provides visual context to other GIS data and tools. The NASA near real-time imagery covers a broad set of Earth science disciplines. By leveraging the ECHO and GIBS services, these data can become a visual context within which other GIS activities are performed. The focus of this presentation is to discuss the GIBS imagery and ECHO metadata services facilitating near real-time discovery and usage. Existing synergies and future possibilities will also be discussed. The NASA Worldview demonstration client will be used to show an existing application combining the ECHO and GIBS services.
Roadmap for cardiovascular circulation model

PubMed Central

Bradley, Christopher P.; Suresh, Vinod; Mithraratne, Kumar; Muller, Alexandre; Ho, Harvey; Ladd, David; Hellevik, Leif R.; Omholt, Stig W.; Chase, J. Geoffrey; Müller, Lucas O.; Watanabe, Sansuke M.; Blanco, Pablo J.; de Bono, Bernard; Hunter, Peter J.

2016-01-01

Abstract Computational models of many aspects of the mammalian cardiovascular circulation have been developed. Indeed, along with orthopaedics, this area of physiology is one that has attracted much interest from engineers, presumably because the equations governing blood flow in the vascular system are well understood and can be solved with well‐established numerical techniques. Unfortunately, there have been only a few attempts to create a comprehensive public domain resource for cardiovascular researchers. In this paper we propose a roadmap for developing an open source cardiovascular circulation model. The model should be registered to the musculo‐skeletal system. The computational infrastructure for the cardiovascular model should provide for near real‐time computation of blood flow and pressure in all parts of the body. The model should deal with vascular beds in all tissues, and the computational infrastructure for the model should provide links into CellML models of cell function and tissue function. In this work we review the literature associated with 1D blood flow modelling in the cardiovascular system, discuss model encoding standards, software and a model repository. We then describe the coordinate systems used to define the vascular geometry, derive the equations and discuss the implementation of these coupled equations in the open source computational software OpenCMISS. Finally, some preliminary results are presented and plans outlined for the next steps in the development of the model, the computational software and the graphical user interface for accessing the model. PMID:27506597
Roadmap for cardiovascular circulation model.

PubMed

Safaei, Soroush; Bradley, Christopher P; Suresh, Vinod; Mithraratne, Kumar; Muller, Alexandre; Ho, Harvey; Ladd, David; Hellevik, Leif R; Omholt, Stig W; Chase, J Geoffrey; Müller, Lucas O; Watanabe, Sansuke M; Blanco, Pablo J; de Bono, Bernard; Hunter, Peter J

2016-12-01

Computational models of many aspects of the mammalian cardiovascular circulation have been developed. Indeed, along with orthopaedics, this area of physiology is one that has attracted much interest from engineers, presumably because the equations governing blood flow in the vascular system are well understood and can be solved with well-established numerical techniques. Unfortunately, there have been only a few attempts to create a comprehensive public domain resource for cardiovascular researchers. In this paper we propose a roadmap for developing an open source cardiovascular circulation model. The model should be registered to the musculo-skeletal system. The computational infrastructure for the cardiovascular model should provide for near real-time computation of blood flow and pressure in all parts of the body. The model should deal with vascular beds in all tissues, and the computational infrastructure for the model should provide links into CellML models of cell function and tissue function. In this work we review the literature associated with 1D blood flow modelling in the cardiovascular system, discuss model encoding standards, software and a model repository. We then describe the coordinate systems used to define the vascular geometry, derive the equations and discuss the implementation of these coupled equations in the open source computational software OpenCMISS. Finally, some preliminary results are presented and plans outlined for the next steps in the development of the model, the computational software and the graphical user interface for accessing the model. © 2016 The Authors. The Journal of Physiology © 2016 The Physiological Society.
Extending Supernova Spectral Templates for Next Generation Space Telescope Observations

NASA Astrophysics Data System (ADS)

Roberts-Pierel, Justin; Rodney, Steven A.; Steven Rodney

2018-01-01

Widely used empirical supernova (SN) Spectral Energy Distributions (SEDs) have not historically extended meaningfully into the ultraviolet (UV), or the infrared (IR). However, both are critical for current and future aspects of SN research including UV spectra as probes of poorly understood SN Ia physical properties, and expanding our view of the universe with high-redshift James Webb Space Telescope (JWST) IR observations. We therefore present a comprehensive set of SN SED templates that have been extended into the UV and IR, as well as an open-source software package written in Python that enables a user to generate their own extrapolated SEDs. We have taken a sampling of core-collapse (CC) and Type Ia SNe to get a time-dependent distribution of UV and IR colors (U-B,r’-[JHK]), and then generated color curves are used to extrapolate SEDs into the UV and IR. The SED extrapolation process is now easily duplicated using a user’s own data and parameters via our open-source Python package: SNSEDextend. This work develops the tools necessary to explore the JWST’s ability to discriminate between CC and Type Ia SNe, as well as provides a repository of SN SEDs that will be invaluable to future JWST and WFIRST SN studies.
MetaDB a Data Processing Workflow in Untargeted MS-Based Metabolomics Experiments.

PubMed

Franceschi, Pietro; Mylonas, Roman; Shahaf, Nir; Scholz, Matthias; Arapitsas, Panagiotis; Masuero, Domenico; Weingart, Georg; Carlin, Silvia; Vrhovsek, Urska; Mattivi, Fulvio; Wehrens, Ron

2014-01-01

Due to their sensitivity and speed, mass-spectrometry based analytical technologies are widely used to in metabolomics to characterize biological phenomena. To address issues like metadata organization, quality assessment, data processing, data storage, and, finally, submission to public repositories, bioinformatic pipelines of a non-interactive nature are often employed, complementing the interactive software used for initial inspection and visualization of the data. These pipelines often are created as open-source software allowing the complete and exhaustive documentation of each step, ensuring the reproducibility of the analysis of extensive and often expensive experiments. In this paper, we will review the major steps which constitute such a data processing pipeline, discussing them in the context of an open-source software for untargeted MS-based metabolomics experiments recently developed at our institute. The software has been developed by integrating our metaMS R package with a user-friendly web-based application written in Grails. MetaMS takes care of data pre-processing and annotation, while the interface deals with the creation of the sample lists, the organization of the data storage, and the generation of survey plots for quality assessment. Experimental and biological metadata are stored in the ISA-Tab format making the proposed pipeline fully integrated with the Metabolights framework.

An open source framework for tracking and state estimation ('Stone Soup')

NASA Astrophysics Data System (ADS)

Thomas, Paul A.; Barr, Jordi; Balaji, Bhashyam; White, Kruger

2017-05-01

The ability to detect and unambiguously follow all moving entities in a state-space is important in multiple domains both in defence (e.g. air surveillance, maritime situational awareness, ground moving target indication) and the civil sphere (e.g. astronomy, biology, epidemiology, dispersion modelling). However, tracking and state estimation researchers and practitioners have difficulties recreating state-of-the-art algorithms in order to benchmark their own work. Furthermore, system developers need to assess which algorithms meet operational requirements objectively and exhaustively rather than intuitively or driven by personal favourites. We have therefore commenced the development of a collaborative initiative to create an open source framework for production, demonstration and evaluation of Tracking and State Estimation algorithms. The initiative will develop a (MIT-licensed) software platform for researchers and practitioners to test, verify and benchmark a variety of multi-sensor and multi-object state estimation algorithms. The initiative is supported by four defence laboratories, who will contribute to the development effort for the framework. The tracking and state estimation community will derive significant benefits from this work, including: access to repositories of verified and validated tracking and state estimation algorithms, a framework for the evaluation of multiple algorithms, standardisation of interfaces and access to challenging data sets. Keywords: Tracking,
Collaborative Data Publication Utilizing the Open Data Repository's Data Publisher

NASA Technical Reports Server (NTRS)

Stone, N.; Lafuente, B.; Bristow, T.; Keller, R. M.; Downs, R. T.; Blake, D.; Fonda, M.; Dateo, C.; Pires, A.

2017-01-01

For small communities in multidisciplinary fields such as astrobiology, publishing and sharing data can be challenging. While large, homogenous fields often have repositories and existing data standards, small groups of independent researchers have few options for publishing data that can be utilized within their community. In conjunction with teams at NASA Ames and the University of Arizona, a number of pilot studies are being conducted to assess the needs of these research groups and to guide the software development so that it allows them to publish and share their data collaboratively.
JHelioviewer: Open-Source Software for Discovery and Image Access in the Petabyte Age

NASA Astrophysics Data System (ADS)

Mueller, D.; Dimitoglou, G.; Garcia Ortiz, J.; Langenberg, M.; Nuhn, M.; Dau, A.; Pagel, S.; Schmidt, L.; Hughitt, V. K.; Ireland, J.; Fleck, B.

2011-12-01

The unprecedented torrent of data returned by the Solar Dynamics Observatory is both a blessing and a barrier: a blessing for making available data with significantly higher spatial and temporal resolution, but a barrier for scientists to access, browse and analyze them. With such staggering data volume, the data is accessible only from a few repositories and users have to deal with data sets effectively immobile and practically difficult to download. From a scientist's perspective this poses three challenges: accessing, browsing and finding interesting data while avoiding the proverbial search for a needle in a haystack. To address these challenges, we have developed JHelioviewer, an open-source visualization software that lets users browse large data volumes both as still images and movies. We did so by deploying an efficient image encoding, storage, and dissemination solution using the JPEG 2000 standard. This solution enables users to access remote images at different resolution levels as a single data stream. Users can view, manipulate, pan, zoom, and overlay JPEG 2000 compressed data quickly, without severe network bandwidth penalties. Besides viewing data, the browser provides third-party metadata and event catalog integration to quickly locate data of interest, as well as an interface to the Virtual Solar Observatory to download science-quality data. As part of the ESA/NASA Helioviewer Project, JHelioviewer offers intuitive ways to browse large amounts of heterogeneous data remotely and provides an extensible and customizable open-source platform for the scientific community. In addition, the easy-to-use graphical user interface enables the general public and educators to access, enjoy and reuse data from space missions without barriers.
Sustaining Open Source Communities through Hackathons - An Example from the ASPECT Community

NASA Astrophysics Data System (ADS)

Heister, T.; Hwang, L.; Bangerth, W.; Kellogg, L. H.

2016-12-01

The ecosystem surrounding a successful scientific open source software package combines both social and technical aspects. Much thought has been given to the technology side of writing sustainable software for large infrastructure projects and software libraries, but less about building the human capacity to perpetuate scientific software used in computational modeling. One effective format for building capacity is regular multi-day hackathons. Scientific hackathons bring together a group of science domain users and scientific software contributors to make progress on a specific software package. Innovation comes through the chance to work with established and new collaborations. Especially in the domain sciences with small communities, hackathons give geographically distributed scientists an opportunity to connect face-to-face. They foster lively discussions amongst scientists with different expertise, promote new collaborations, and increase transparency in both the technical and scientific aspects of code development. ASPECT is an open source, parallel, extensible finite element code to simulate thermal convection, that began development in 2011 under the Computational Infrastructure for Geodynamics. ASPECT hackathons for the past 3 years have grown the number of authors to >50, training new code maintainers in the process. Hackathons begin with leaders establishing project-specific conventions for development, demonstrating the workflow for code contributions, and reviewing relevant technical skills. Each hackathon expands the developer community. Over 20 scientists add >6,000 lines of code during the >1 week event. Participants grow comfortable contributing to the repository and over half continue to contribute afterwards. A high return rate of participants ensures continuity and stability of the group as well as mentoring for novice members. We hope to build other software communities on this model, but anticipate each to bring their own unique challenges.
Audit of a Scientific Data Center for Certification as a Trustworthy Digital Repository: A Case Study

NASA Astrophysics Data System (ADS)

Downs, R. R.; Chen, R. S.

2011-12-01

Services that preserve and enable future access to scientific data are necessary to ensure that the data that are being collected today will be available for use by future generations of scientists. Many data centers, archives, and other digital repositories are working to improve their ability to serve as long-term stewards of scientific data. Trust in sustainable data management and preservation capabilities of digital repositories can influence decisions to use these services to deposit or obtain scientific data. Building on the Open Archival Information System (OAIS) Reference Model developed by the Consultative Committee for Space Data Systems (CCSDS) and adopted by the International Organization for Standardization as ISO 14721:2003, new standards are being developed to improve long-term data management processes and documentation. The Draft Information Standard ISO/DIS 16363, "Space data and information transfer systems - Audit and certification of trustworthy digital repositories" offers the potential to evaluate digital repositories objectively in terms of their trustworthiness as long-term stewards of digital resources. In conjunction with this, the CCSDS and ISO are developing another draft standard for the auditing and certification process, ISO/DIS 16919, "Space data and information transfer systems - Requirements for bodies providing audit and certification of candidate trustworthy digital repositories". Six test audits were conducted of scientific data centers and archives in Europe and the United States to test the use of these draft standards and identify potential improvements for the standards and for the participating digital repositories. We present a case study of the test audit conducted on the NASA Socioeconomic Data and Applications Center (SEDAC) and describe the preparation, the audit process, recommendations received, and next steps to obtain certification as a trustworthy digital repository, after approval of the ISO/DIS standards.
Cluster-lensing: A Python Package for Galaxy Clusters and Miscentering

NASA Astrophysics Data System (ADS)

Ford, Jes; VanderPlas, Jake

2016-12-01

We describe a new open source package for calculating properties of galaxy clusters, including Navarro, Frenk, and White halo profiles with and without the effects of cluster miscentering. This pure-Python package, cluster-lensing, provides well-documented and easy-to-use classes and functions for calculating cluster scaling relations, including mass-richness and mass-concentration relations from the literature, as well as the surface mass density {{Σ }}(R) and differential surface mass density {{Δ }}{{Σ }}(R) profiles, probed by weak lensing magnification and shear. Galaxy cluster miscentering is especially a concern for stacked weak lensing shear studies of galaxy clusters, where offsets between the assumed and the true underlying matter distribution can lead to a significant bias in the mass estimates if not accounted for. This software has been developed and released in a public GitHub repository, and is licensed under the permissive MIT license. The cluster-lensing package is archived on Zenodo. Full documentation, source code, and installation instructions are available at http://jesford.github.io/cluster-lensing/.
Extension of research data repository system to support direct compute access to biomedical datasets: enhancing Dataverse to support large datasets.

PubMed

McKinney, Bill; Meyer, Peter A; Crosas, Mercè; Sliz, Piotr

2017-01-01

Access to experimental X-ray diffraction image data is important for validation and reproduction of macromolecular models and indispensable for the development of structural biology processing methods. In response to the evolving needs of the structural biology community, we recently established a diffraction data publication system, the Structural Biology Data Grid (SBDG, data.sbgrid.org), to preserve primary experimental datasets supporting scientific publications. All datasets published through the SBDG are freely available to the research community under a public domain dedication license, with metadata compliant with the DataCite Schema (schema.datacite.org). A proof-of-concept study demonstrated community interest and utility. Publication of large datasets is a challenge shared by several fields, and the SBDG has begun collaborating with the Institute for Quantitative Social Science at Harvard University to extend the Dataverse (dataverse.org) open-source data repository system to structural biology datasets. Several extensions are necessary to support the size and metadata requirements for structural biology datasets. In this paper, we describe one such extension-functionality supporting preservation of file system structure within Dataverse-which is essential for both in-place computation and supporting non-HTTP data transfers. © 2016 New York Academy of Sciences.
CMR Metadata Curation

NASA Technical Reports Server (NTRS)

Shum, Dana; Bugbee, Kaylin

2017-01-01

This talk explains the ongoing metadata curation activities in the Common Metadata Repository. It explores tools that exist today which are useful for building quality metadata and also opens up the floor for discussions on other potentially useful tools.
Revision history aware repositories of computational models of biological systems.

PubMed

Miller, Andrew K; Yu, Tommy; Britten, Randall; Cooling, Mike T; Lawson, James; Cowan, Dougal; Garny, Alan; Halstead, Matt D B; Hunter, Peter J; Nickerson, David P; Nunns, Geo; Wimalaratne, Sarala M; Nielsen, Poul M F

2011-01-14

Building repositories of computational models of biological systems ensures that published models are available for both education and further research, and can provide a source of smaller, previously verified models to integrate into a larger model. One problem with earlier repositories has been the limitations in facilities to record the revision history of models. Often, these facilities are limited to a linear series of versions which were deposited in the repository. This is problematic for several reasons. Firstly, there are many instances in the history of biological systems modelling where an 'ancestral' model is modified by different groups to create many different models. With a linear series of versions, if the changes made to one model are merged into another model, the merge appears as a single item in the history. This hides useful revision history information, and also makes further merges much more difficult, as there is no record of which changes have or have not already been merged. In addition, a long series of individual changes made outside of the repository are also all merged into a single revision when they are put back into the repository, making it difficult to separate out individual changes. Furthermore, many earlier repositories only retain the revision history of individual files, rather than of a group of files. This is an important limitation to overcome, because some types of models, such as CellML 1.1 models, can be developed as a collection of modules, each in a separate file. The need for revision history is widely recognised for computer software, and a lot of work has gone into developing version control systems and distributed version control systems (DVCSs) for tracking the revision history. However, to date, there has been no published research on how DVCSs can be applied to repositories of computational models of biological systems. We have extended the Physiome Model Repository software to be fully revision history aware, by building it on top of Mercurial, an existing DVCS. We have demonstrated the utility of this approach, when used in conjunction with the model composition facilities in CellML, to build and understand more complex models. We have also demonstrated the ability of the repository software to present version history to casual users over the web, and to highlight specific versions which are likely to be useful to users. Providing facilities for maintaining and using revision history information is an important part of building a useful repository of computational models, as this information is useful both for understanding the source of and justification for parts of a model, and to facilitate automated processes such as merges. The availability of fully revision history aware repositories, and associated tools, will therefore be of significant benefit to the community.
eXframe: reusable framework for storage, analysis and visualization of genomics experiments

PubMed Central

2011-01-01

Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1) the ability to publish structured data compliant with accepted standards 2) support for multiple data types including microarrays and next generation sequencing 3) query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples) and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own useful modifications. PMID:22103807
The real-time fMRI neurofeedback based stratification of Default Network Regulation Neuroimaging data repository.

PubMed

McDonald, Amalia R; Muraskin, Jordan; Dam, Nicholas T Van; Froehlich, Caroline; Puccio, Benjamin; Pellman, John; Bauer, Clemens C C; Akeyson, Alexis; Breland, Melissa M; Calhoun, Vince D; Carter, Steven; Chang, Tiffany P; Gessner, Chelsea; Gianonne, Alyssa; Giavasis, Steven; Glass, Jamie; Homann, Steven; King, Margaret; Kramer, Melissa; Landis, Drew; Lieval, Alexis; Lisinski, Jonathan; Mackay-Brandt, Anna; Miller, Brittny; Panek, Laura; Reed, Hayley; Santiago, Christine; Schoell, Eszter; Sinnig, Richard; Sital, Melissa; Taverna, Elise; Tobe, Russell; Trautman, Kristin; Varghese, Betty; Walden, Lauren; Wang, Runtang; Waters, Abigail B; Wood, Dylan C; Castellanos, F Xavier; Leventhal, Bennett; Colcombe, Stanley J; LaConte, Stephen; Milham, Michael P; Craddock, R Cameron

2017-02-01

This data descriptor describes a repository of openly shared data from an experiment to assess inter-individual differences in default mode network (DMN) activity. This repository includes cross-sectional functional magnetic resonance imaging (fMRI) data from the Multi Source Interference Task, to assess DMN deactivation, the Moral Dilemma Task, to assess DMN activation, a resting state fMRI scan, and a DMN neurofeedback paradigm, to assess DMN modulation, along with accompanying behavioral and cognitive measures. We report technical validation from n=125 participants of the final targeted sample of 180 participants. Each session includes acquisition of one whole-brain anatomical scan and whole-brain echo-planar imaging (EPI) scans, acquired during the aforementioned tasks and resting state. The data includes several self-report measures related to perseverative thinking, emotion regulation, and imaginative processes, along with a behavioral measure of rapid visual information processing. Technical validation of the data confirms that the tasks deactivate and activate the DMN as expected. Group level analysis of the neurofeedback data indicates that the participants are able to modulate their DMN with considerable inter-subject variability. Preliminary analysis of behavioral responses and specifically self-reported sleep indicate that as many as 73 participants may need to be excluded from an analysis depending on the hypothesis being tested. The present data are linked to the enhanced Nathan Kline Institute, Rockland Sample and builds on the comprehensive neuroimaging and deep phenotyping available therein. As limited information is presently available about individual differences in the capacity to directly modulate the default mode network, these data provide a unique opportunity to examine DMN modulation ability in relation to numerous phenotypic characteristics. Copyright © 2016 Elsevier Inc. All rights reserved.
The Astrobiology Habitable Environments Database (AHED)

NASA Astrophysics Data System (ADS)

Lafuente, B.; Stone, N.; Downs, R. T.; Blake, D. F.; Bristow, T.; Fonda, M.; Pires, A.

2015-12-01

The Astrobiology Habitable Environments Database (AHED) is a central, high quality, long-term searchable repository for archiving and collaborative sharing of astrobiologically relevant data, including, morphological, textural and contextural images, chemical, biochemical, isotopic, sequencing, and mineralogical information. The aim of AHED is to foster long-term innovative research by supporting integration and analysis of diverse datasets in order to: 1) help understand and interpret planetary geology; 2) identify and characterize habitable environments and pre-biotic/biotic processes; 3) interpret returned data from present and past missions; 4) provide a citable database of NASA-funded published and unpublished data (after an agreed-upon embargo period). AHED uses the online open-source software "The Open Data Repository's Data Publisher" (ODR - http://www.opendatarepository.org) [1], which provides a user-friendly interface that research teams or individual scientists can use to design, populate and manage their own database according to the characteristics of their data and the need to share data with collaborators or the broader scientific community. This platform can be also used as a laboratory notebook. The database will have the capability to import and export in a variety of standard formats. Advanced graphics will be implemented including 3D graphing, multi-axis graphs, error bars, and similar scientific data functions together with advanced online tools for data analysis (e. g. the statistical package, R). A permissions system will be put in place so that as data are being actively collected and interpreted, they will remain proprietary. A citation system will allow research data to be used and appropriately referenced by other researchers after the data are made public. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, Mars Science Laboratory Investigations. [1] Nate et al. (2015) AGU, submitted.
COINSTAC: A Privacy Enabled Model and Prototype for Leveraging and Processing Decentralized Brain Imaging Data.

PubMed

Plis, Sergey M; Sarwate, Anand D; Wood, Dylan; Dieringer, Christopher; Landis, Drew; Reed, Cory; Panta, Sandeep R; Turner, Jessica A; Shoemaker, Jody M; Carter, Kim W; Thompson, Paul; Hutchison, Kent; Calhoun, Vince D

2016-01-01

The field of neuroimaging has embraced the need for sharing and collaboration. Data sharing mandates from public funding agencies and major journal publishers have spurred the development of data repositories and neuroinformatics consortia. However, efficient and effective data sharing still faces several hurdles. For example, open data sharing is on the rise but is not suitable for sensitive data that are not easily shared, such as genetics. Current approaches can be cumbersome (such as negotiating multiple data sharing agreements). There are also significant data transfer, organization and computational challenges. Centralized repositories only partially address the issues. We propose a dynamic, decentralized platform for large scale analyses called the Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation (COINSTAC). The COINSTAC solution can include data missing from central repositories, allows pooling of both open and "closed" repositories by developing privacy-preserving versions of widely-used algorithms, and incorporates the tools within an easy-to-use platform enabling distributed computation. We present an initial prototype system which we demonstrate on two multi-site data sets, without aggregating the data. In addition, by iterating across sites, the COINSTAC model enables meta-analytic solutions to converge to "pooled-data" solutions (i.e., as if the entire data were in hand). More advanced approaches such as feature generation, matrix factorization models, and preprocessing can be incorporated into such a model. In sum, COINSTAC enables access to the many currently unavailable data sets, a user friendly privacy enabled interface for decentralized analysis, and a powerful solution that complements existing data sharing solutions.
COINSTAC: A Privacy Enabled Model and Prototype for Leveraging and Processing Decentralized Brain Imaging Data

PubMed Central

Plis, Sergey M.; Sarwate, Anand D.; Wood, Dylan; Dieringer, Christopher; Landis, Drew; Reed, Cory; Panta, Sandeep R.; Turner, Jessica A.; Shoemaker, Jody M.; Carter, Kim W.; Thompson, Paul; Hutchison, Kent; Calhoun, Vince D.

2016-01-01

The field of neuroimaging has embraced the need for sharing and collaboration. Data sharing mandates from public funding agencies and major journal publishers have spurred the development of data repositories and neuroinformatics consortia. However, efficient and effective data sharing still faces several hurdles. For example, open data sharing is on the rise but is not suitable for sensitive data that are not easily shared, such as genetics. Current approaches can be cumbersome (such as negotiating multiple data sharing agreements). There are also significant data transfer, organization and computational challenges. Centralized repositories only partially address the issues. We propose a dynamic, decentralized platform for large scale analyses called the Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation (COINSTAC). The COINSTAC solution can include data missing from central repositories, allows pooling of both open and “closed” repositories by developing privacy-preserving versions of widely-used algorithms, and incorporates the tools within an easy-to-use platform enabling distributed computation. We present an initial prototype system which we demonstrate on two multi-site data sets, without aggregating the data. In addition, by iterating across sites, the COINSTAC model enables meta-analytic solutions to converge to “pooled-data” solutions (i.e., as if the entire data were in hand). More advanced approaches such as feature generation, matrix factorization models, and preprocessing can be incorporated into such a model. In sum, COINSTAC enables access to the many currently unavailable data sets, a user friendly privacy enabled interface for decentralized analysis, and a powerful solution that complements existing data sharing solutions. PMID:27594820
SPECIATE--EPA'S DATABASE OF SPECIATED EMISSION PROFILES

EPA Science Inventory

SPECIATE is EPA's repository of Total Organic Compound and Particulate Matter speciated profiles for a wide variety of sources. The profiles in this system are provided for air quality dispersion modeling and as a library for source-receptor and source apportionment type models. ...
An open repository of earthquake-triggered ground-failure inventories

USGS Publications Warehouse

Schmitt, Robert G.; Tanyas, Hakan; Nowicki Jessee, M. Anna; Zhu, Jing; Biegel, Katherine M.; Allstadt, Kate E.; Jibson, Randall W.; Thompson, Eric M.; van Westen, Cees J.; Sato, Hiroshi P.; Wald, David J.; Godt, Jonathan W.; Gorum, Tolga; Xu, Chong; Rathje, Ellen M.; Knudsen, Keith L.

2017-12-20

Earthquake-triggered ground failure, such as landsliding and liquefaction, can contribute significantly to losses, but our current ability to accurately include them in earthquake-hazard analyses is limited. The development of robust and widely applicable models requires access to numerous inventories of ground failures triggered by earthquakes that span a broad range of terrains, shaking characteristics, and climates. We present an openly accessible, centralized earthquake-triggered groundfailure inventory repository in the form of a ScienceBase Community to provide open access to these data with the goal of accelerating research progress. The ScienceBase Community hosts digital inventories created by both U.S. Geological Survey (USGS) and non-USGS authors. We present the original digital inventory files (when available) as well as an integrated database with uniform attributes. We also summarize the mapping methodology and level of completeness as reported by the original author(s) for each inventory. This document describes the steps taken to collect, process, and compile the inventories and the process for adding additional ground-failure inventories to the ScienceBase Community in the future.
Academic Research Library as Broker in Addressing Interoperability Challenges for the Geosciences

NASA Astrophysics Data System (ADS)

Smith, P., II

2015-12-01

Data capture is an important process in the research lifecycle. Complete descriptive and representative information of the data or database is necessary during data collection whether in the field or in the research lab. The National Science Foundation's (NSF) Public Access Plan (2015) mandates the need for federally funded projects to make their research data more openly available. Developing, implementing, and integrating metadata workflows into to the research process of the data lifecycle facilitates improved data access while also addressing interoperability challenges for the geosciences such as data description and representation. Lack of metadata or data curation can contribute to (1) semantic, (2) ontology, and (3) data integration issues within and across disciplinary domains and projects. Some researchers of EarthCube funded projects have identified these issues as gaps. These gaps can contribute to interoperability data access, discovery, and integration issues between domain-specific and general data repositories. Academic Research Libraries have expertise in providing long-term discovery and access through the use of metadata standards and provision of access to research data, datasets, and publications via institutional repositories. Metadata crosswalks, open archival information systems (OAIS), trusted-repositories, data seal of approval, persistent URL, linking data, objects, resources, and publications in institutional repositories and digital content management systems are common components in the library discipline. These components contribute to a library perspective on data access and discovery that can benefit the geosciences. The USGS Community for Data Integration (CDI) has developed the Science Support Framework (SSF) for data management and integration within its community of practice for contribution to improved understanding of the Earth's physical and biological systems. The USGS CDI SSF can be used as a reference model to map to EarthCube Funded projects with academic research libraries facilitating the data and information assets components of the USGS CDI SSF via institutional repositories and/or digital content management. This session will explore the USGS CDI SSF for cross-discipline collaboration considerations from a library perspective.
Establishing the Role and Impact of Academic Librarians in Supporting Open Research: A Case Study at Leeds Beckett University, UK

ERIC Educational Resources Information Center

Bower, Kirsty; Sheppard, Nick; Bayjoo, Jennifer; Pease, Adele

2017-01-01

This practical article presents findings of a small scale study undertaken at a large U.K. University. The purpose of the study was to encourage academic engagement with Open Access (OA) and the Higher Education Funding Council for England (HEFCE) mandate with the measurable impact being increased engagement with the Repository and dissemination…
Do Open Access Electronic Theses and Dissertations Diminish Publishing Opportunities in the Social Sciences and Humanities? Findings from a 2011 Survey of Academic Publishers

ERIC Educational Resources Information Center

Ramirez, Marisa L.; Dalton, Joan T.; McMillan, Gail; Read, Max; Seamans, Nancy H.

2013-01-01

An increasing number of higher education institutions worldwide are requiring submission of electronic theses and dissertations (ETDs) by graduate students and are subsequently providing open access to these works in online repositories. Faculty advisors and graduate students are concerned that such unfettered access to their work could diminish…
The Geothermal Data Repository: Five Years of Open Geothermal Data, Benefits to the Community: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weers, Jonathan D; Taverna, Nicole; Anderson, Arlene

In the five years since its inception, the Department of Energy's (DOE) Geothermal Data Repository (GDR) has grown from the simple idea of storing public data in a centralized location to a valuable tool at the center of the DOE open data movement where it is providing a tangible benefit to the geothermal scientific community. Throughout this time, the GDR project team has been working closely with the community to refine the data submission process, improve the quality of submitted data, and embrace modern proper data management strategies to maximize the value and utility of submitted data. This paper exploresmore » some of the motivations behind various improvements to the GDR over the last 5 years, changes in data submission trends, and the ways in which these improvements have helped to drive research, fuel innovation, and accelerate the adoption of geothermal technologies.« less

The Geothermal Data Repository: Five Years of Open Geothermal Data, Benefits to the Community

DOE Office of Scientific and Technical Information (OSTI.GOV)

Weers, Jonathan D; Taverna, Nicole; Anderson, Arlene

In the five years since its inception, the Department of Energy's (DOE) Geothermal Data Repository (GDR) has grown from the simple idea of storing public data in a centralized location to a valuable tool at the center of the DOE open data movement where it is providing a tangible benefit to the geothermal scientific community. Throughout this time, the GDR project team has been working closely with the community to refine the data submission process, improve the quality of submitted data, and embrace modern proper data management strategies to maximize the value and utility of submitted data. This paper exploresmore » some of the motivations behind various improvements to the GDR over the last 5 years, changes in data submission trends, and the ways in which these improvements have helped to drive research, fuel innovation, and accelerate the adoption of geothermal technologies.« less
Science on Drupal: An evaluation of CMS Technologies

NASA Astrophysics Data System (ADS)

Vinay, S.; Gonzalez, A.; Pinto, A.; Pascuzzi, F.; Gerard, A.

2011-12-01

We conducted an extensive evaluation of various Content Management System (CMS) technologies for implementing different websites supporting interdisciplinary science data and information. We chose two products, Drupal and Bluenog/Hippo CMS, to meet our specific needs and requirements. Drupal is an open source product that is quick and easy to setup and use. It is a very mature, stable, and widely used product. It has rich functionality supported by a large and active user base and developer community. There are many plugins available that provide additional features for managing citations, map gallery, semantic search, digital repositories (fedora), scientific workflows, collaborative authoring, social networking, and other functions. All of these work very well within the Drupal framework if minimal customization is needed. We have successfully implemented Drupal for multiple projects such as: 1) the Haiti Regeneration Initiative (http://haitiregeneration.org/); 2) the Consortium on Climate Risk in the Urban Northeast (http://beta.ccrun.org/); and 3) the Africa Soils Information Service (http://africasoils.net/). We are also developing two other websites, the Côte Sud Initiative (CSI) and Emerging Infectious Diseases, using Drupal. We are testing the Drupal multi-site install for managing different websites with one install to streamline the maintenance. In addition, paid support and consultancy for Drupal website development are available at affordable prices. All of these features make Drupal very attractive for implementing state-of-the-art scientific websites that do not have complex requirements. One of our major websites, the NASA Socioeconomic Data and Applications Center (SEDAC), has a very complex set of requirements. It has to easily re-purpose content across multiple web pages and sites with different presentations. It has to serve the content via REST or similar standard interfaces so that external client applications can access content in the CMS repository. This means the content repository and structure should be completely separated from the content presentation and site structure. In addition to the CMS repository, the front-end website has to be able to consume, integrate, and display diverse content flexibly from multiple back-end systems, including custom and legacy systems, such as Oracle, Geoserver, Flickr, Fedora, and other web services. We needed the ability to customize the workflow to author, edit, approve, and publish content based on different content types and project requirements. In addition, we required the ability to use the existing active directory for user management with support for roles and groups and permissions using Access Control List (ACL) model. The ability to version and lock content was also important. We determined that most of these capabilities are difficult to implement with Drupal and needed significant customization. The Bluenog eCMS (enterprise CMS) product satisfied most of these requirements. Bluenog eCMS is based on an open source product called Hippo with customizations and support provided by the vendor Bluenog. Our newly redesigned and recently released SEDAC website, http://sedac.ciesin.columbia.edu, is implemented using Bluenog eCMS. Other products we evaluated include WebLogic portal, Magnolia, Liferay portal, and Alfresco.
The Open Microscopy Environment: open image informatics for the biological sciences

NASA Astrophysics Data System (ADS)

Blackburn, Colin; Allan, Chris; Besson, Sébastien; Burel, Jean-Marie; Carroll, Mark; Ferguson, Richard K.; Flynn, Helen; Gault, David; Gillen, Kenneth; Leigh, Roger; Leo, Simone; Li, Simon; Lindner, Dominik; Linkert, Melissa; Moore, Josh; Moore, William J.; Ramalingam, Balaji; Rozbicki, Emil; Rustici, Gabriella; Tarkowska, Aleksandra; Walczysko, Petr; Williams, Eleanor; Swedlow, Jason R.

2016-07-01

Despite significant advances in biological imaging and analysis, major informatics challenges remain unsolved: file formats are proprietary, storage and analysis facilities are lacking, as are standards for sharing image data and results. While the open FITS file format is ubiquitous in astronomy, astronomical imaging shares many challenges with biological imaging, including the need to share large image sets using secure, cross-platform APIs, and the need for scalable applications for processing and visualization. The Open Microscopy Environment (OME) is an open-source software framework developed to address these challenges. OME tools include: an open data model for multidimensional imaging (OME Data Model); an open file format (OME-TIFF) and library (Bio-Formats) enabling free access to images (5D+) written in more than 145 formats from many imaging domains, including FITS; and a data management server (OMERO). The Java-based OMERO client-server platform comprises an image metadata store, an image repository, visualization and analysis by remote access, allowing sharing and publishing of image data. OMERO provides a means to manage the data through a multi-platform API. OMERO's model-based architecture has enabled its extension into a range of imaging domains, including light and electron microscopy, high content screening, digital pathology and recently into applications using non-image data from clinical and genomic studies. This is made possible using the Bio-Formats library. The current release includes a single mechanism for accessing image data of all types, regardless of original file format, via Java, C/C++ and Python and a variety of applications and environments (e.g. ImageJ, Matlab and R).
Two-step web-mining approach to study geology/geophysics-related open-source software projects

NASA Astrophysics Data System (ADS)

Behrends, Knut; Conze, Ronald

2013-04-01

Geology/geophysics is a highly interdisciplinary science, overlapping with, for instance, physics, biology and chemistry. In today's software-intensive work environments, geoscientists often encounter new open-source software from scientific fields that are only remotely related to the own field of expertise. We show how web-mining techniques can help to carry out systematic discovery and evaluation of such software. In a first step, we downloaded ~500 abstracts (each consisting of ~1 kb UTF-8 text) from agu-fm12.abstractcentral.com. This web site hosts the abstracts of all publications presented at AGU Fall Meeting 2012, the world's largest annual geology/geophysics conference. All abstracts belonged to the category "Earth and Space Science Informatics", an interdisciplinary label cross-cutting many disciplines such as "deep biosphere", "atmospheric research", and "mineral physics". Each publication was represented by a highly structured record with ~20 short data attributes, the largest authorship-record being the unstructured "abstract" field. We processed texts of the abstracts with the statistics software "R" to calculate a corpus and a term-document matrix. Using R package "tm", we applied text-mining techniques to filter data and develop hypotheses about software-development activities happening in various geology/geophysics fields. Analyzing the term-document matrix with basic techniques (e.g., word frequencies, co-occurences, weighting) as well as more complex methods (clustering, classification) several key pieces of information were extracted. For example, text-mining can be used to identify scientists who are also developers of open-source scientific software, and the names of their programming projects and codes can also be identified. In a second step, based on the intermediate results found by processing the conference-abstracts, any new hypotheses can be tested in another webmining subproject: by merging the dataset with open data from github.com and stackoverflow.com. These popular, developer-centric websites have powerful application-programmer interfaces, and follow an open-data policy. In this regard, these sites offer a web-accessible reservoir of information that can be tapped to study questions such as: which open source software projects are eminent in the various geoscience fields? What are the most popular programming languages? How are they trending? Are there any interesting temporal patterns in committer activities? How large are programming teams and how do they change over time? What free software packages exist in the vast realms of related fields? Does the software from these fields have capabilities that might still be useful to me as a researcher, or can help me perform my work better? Are there any open-source projects that might be commercially interesting? This evaluation strategy reveals programming projects that tend to be new. As many important legacy codes are not hosted on open-source code-repositories, the presented search method might overlook some older projects.
Earth-Observation based mapping and monitoring of exposure change in the megacity of Istanbul: open-source tools from the MARSITE project

NASA Astrophysics Data System (ADS)

De Vecchi, Daniele; Dell'Acqua, Fabio

2016-04-01

The EU FP7 MARSITE project aims at assessing the "state of the art" of seismic risk evaluation and management at European level, as a starting point to move a "step forward" towards new concepts of risk mitigation and management by long-term monitoring activities carried out both on land and at sea. Spaceborne Earth Observation (EO) is one of the means through which MARSITE is accomplishing this commitment, whose importance is growing as a consequence of the operational unfolding of the Copernicus initiative. Sentinel-2 data, with its open-data policy, represents an unprecedented opportunity to access global spaceborne multispectral data for various purposes including risk monitoring. In the framework of EU FP7 projects MARSITE, RASOR and SENSUM, our group has developed a suite of geospatial software tools to automatically extract risk-related features from EO data, especially on the exposure and vulnerability side of the "risk equation" [1]. These are for example the extension of a built-up area or the distribution of building density. These tools are available open-source as QGIS plug-ins [2] and their source code can be freely downloaded from GitHub [3]. A test case on the risk-prone mega city of Istanbul has been set up, and preliminary results will be presented in this paper. The output of the algorithms can be incorporated into a risk modeling process, whose output is very useful to stakeholders and decision makers who intend to assess and mitigate the risk level across the giant urban agglomerate. Keywords - Remote Sensing, Copernicus, Istanbul megacity, seismic risk, multi-risk, exposure, open-source References [1] Harb, M.M.; De Vecchi, D.; Dell'Acqua, F., "Physical Vulnerability Proxies from Remotes Sensing: Reviewing, Implementing and Disseminating Selected Techniques," Geoscience and Remote Sensing Magazine, IEEE , vol.3, no.1, pp.20,33, March 2015. doi: 10.1109/MGRS.2015.2398672 [2] SENSUM QGIS plugin, 2016, available online at: https://plugins.qgis.org/plugins/sensum_eo_tools/ [3] SENSUM QGIS code repository, 2016, available online at: https://github.com/SENSUM-project/sensum_rs_qgis
Chemical effects in biological systems (CEBS) object model for toxicology data, SysTox-OM: design and application.

PubMed

Xirasagar, Sandhya; Gustafson, Scott F; Huang, Cheng-Cheng; Pan, Qinyan; Fostel, Jennifer; Boyer, Paul; Merrick, B Alex; Tomer, Kenneth B; Chan, Denny D; Yost, Kenneth J; Choi, Danielle; Xiao, Nianqing; Stasiewicz, Stanley; Bushel, Pierre; Waters, Michael D

2006-04-01

The CEBS data repository is being developed to promote a systems biology approach to understand the biological effects of environmental stressors. CEBS will house data from multiple gene expression platforms (transcriptomics), protein expression and protein-protein interaction (proteomics), and changes in low molecular weight metabolite levels (metabolomics) aligned by their detailed toxicological context. The system will accommodate extensive complex querying in a user-friendly manner. CEBS will store toxicological contexts including the study design details, treatment protocols, animal characteristics and conventional toxicological endpoints such as histopathology findings and clinical chemistry measures. All of these data types can be integrated in a seamless fashion to enable data query and analysis in a biologically meaningful manner. An object model, the SysBio-OM (Xirasagar et al., 2004) has been designed to facilitate the integration of microarray gene expression, proteomics and metabolomics data in the CEBS database system. We now report SysTox-OM as an open source systems toxicology model designed to integrate toxicological context into gene expression experiments. The SysTox-OM model is comprehensive and leverages other open source efforts, namely, the Standard for Exchange of Nonclinical Data (http://www.cdisc.org/models/send/v2/index.html) which is a data standard for capturing toxicological information for animal studies and Clinical Data Interchange Standards Consortium (http://www.cdisc.org/models/sdtm/index.html) that serves as a standard for the exchange of clinical data. Such standardization increases the accuracy of data mining, interpretation and exchange. The open source SysTox-OM model, which can be implemented on various software platforms, is presented here. A universal modeling language (UML) depiction of the entire SysTox-OM is available at http://cebs.niehs.nih.gov and the Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. Currently, the public toxicological data in CEBS can be queried via a web application based on the SysTox-OM at http://cebs.niehs.nih.gov xirasagars@saic.com Supplementary data are available at Bioinformatics online.
Data-intensive science gateway for rock physicists and volcanologists.

NASA Astrophysics Data System (ADS)

Filgueira, Rosa; Atkinson, Malcom; Bell, Andrew; Main, Ian; Boon, Steve; Meredith, Philp; Kilburn, Christopher

2014-05-01

Scientists have always shared data and mathematical models of the phenomena they study. Rock physics and Volcanology, as well as other solid-Earth sciences, have increasingly used Internet communications and computational renditions of their models for this purpose over the last two decades. Here we consider how to organise rock physics and volcanology data to open up opportunities for sharing and comparing both experiment data from experiments, observations and model runs and analytic interpretations of these data. Our hypothesis is that if we facilitate productive information sharing across those communities by using a new science gateway, it will benefit the science. The proposed science gateway should make the first steps for making existing research practices easier and facilitate new research. It will achieve this by supporting three major functions: 1) sharing data from laboratories and observatories, experimental facilities and models; 2) sharing models of rock fracture and methods for analysing experimental data; and 3) supporting recurrent operational tasks, such as data collection and model application in real time. We report initial work in two projects (NERC EFFORT and NERC CREEP-2) and experience with an early web-accessible protytpe called EFFORT gateway, where we are implementing such information sharing services for those projects. 1. Sharing data: In EFFORT gateway, we are working on several facilities for sharing data: *Upload data: We have designed and developed a new adaptive data transfer java tool called FAST (Flexible Automated Streaming Transfer) to upload experimental data and metadata periodically from laboratories to our repository. *Visualisation: As data are deposited in the repository, a visualisation of the accumulated data is made available for display in the Web portal. *Metadata and catalogues: The gateway uses a repository to hold all the data and a catalogue to hold all the corresponding metadata. 2. Sharing models and methods: The EFFORT gateway uses a repository to hold all of the models and a catalogue to hold the corresponding metadata. It provides several Web facilities for uploading, accessing and testing models. *Upload and store models: Through the gateway, researchers can upload as many models to the repository as they want. *Description of models: The gateway solicits and creates metadata for every model uploaded to store in the catalogue. *Search for models: Researchers can search the catalogue for models by using prepackaged sql-queries. *Access to models: Once a researcher has selected the model(s) that is going to be used for analysing an experiment, it will be obtained from the gateway. *Services to test and run models: Once a researcher selects a model and the experimental data to which it should be applied, the gateway submits the corresponding computational job to a high-performance computational (HPC) resource hiding technical details. Once a job is submitted to the HPC cluster, the results are displayed in the gateway in real time, catalogued and stored in the data repository, allowing further researcher-instigated operations to retrieve, inspect and aggregate results. *Services to write models: We have desgined VarPy library, which is an open-source toolbox which provides a Python framework for analysing volcanology and rock physics data. It provides several functions, which allow users to define their own workflows to develop models, analyses and visualizations. 3. Recurrent Operations: We have started to introduce some recurrent operations: *Automated data upload: FAST provides a mechanism to automate the data upload. *Periodic activation of models: The EFFORT gateway allows researchers to run different models periodically against the experimental data that are being or have been uploaded
CellBase, a comprehensive collection of RESTful web services for retrieving relevant biological information from heterogeneous sources.

PubMed

Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio

2012-07-01

During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
The development of an application for data privacy by applying an audit repository based on IHE ATNA.

PubMed

Bresser, Laura; Köhler, Steffen; Schwaab, Christoph

2014-01-01

It is necessary to optimize workflows and communication between institutions involved in patients' treatment to improve quality and efficiency of the German healthcare. To achieve these in the Metropolregion Rhein-Neckar, a personal, cross-institutional patient record (PEPA) is used. Given the immense sensitivity of health-related information saved in the PEPA, it is imperative to obey the data protection regulations in Germany. One important aspect is the logging of access to personal health data and all other safety-related events. For gathering audit information, the IHE profile ATNA can be used, because it provides a flexible and standardized infrastructure. There are already existing solutions for gathering the audit information based on ATNA. In this article one solution (OpenATNA) is evaluated, which uses the method of evaluation defined by Peter Baumgartner. In addition, a user interface for a privacy officer is necessary to support the examination of the audit information. Therefore, we will describe a method to develop an application in Liferay (an OpenSource enterprise portal project) which supports examinations on the gathered audit information.
A BPMN solution for chaining OGC services to quality assure location-based crowdsourced data

NASA Astrophysics Data System (ADS)

Meek, Sam; Jackson, Mike; Leibovici, Didier G.

2016-02-01

The Open Geospatial Consortium (OGC) Web Processing Service (WPS) standard enables access to a centralized repository of processes and services from compliant clients. A crucial part of the standard includes the provision to chain disparate processes and services to form a reusable workflow. To date this has been realized by methods such as embedding XML requests, using Business Process Execution Language (BPEL) engines and other external orchestration engines. Although these allow the user to define tasks and data artifacts as web services, they are often considered inflexible and complicated, often due to vendor specific solutions and inaccessible documentation. This paper introduces a new method of flexible service chaining using the standard Business Process Markup Notation (BPMN). A prototype system has been developed upon an existing open source BPMN suite to illustrate the advantages of the approach. The motivation for the software design is qualification of crowdsourced data for use in policy-making. The software is tested as part of a project that seeks to qualify, assure, and add value to crowdsourced data in a biological monitoring use case.
Criteria for the evaluation and certification of long-term digital archives in the earth sciences

NASA Astrophysics Data System (ADS)

Klump, Jens

2010-05-01

Digital information has become an indispensable part of our cultural and scientific heritage. Scientific findings, historical documents and cultural achievements are to a rapidly increasing extent being presented in electronic form - in many cases exclusively so. However, besides the invaluable advantages offered by this form, it also carries a serious disadvantage: users need to invest a great deal of technical effort in accessing the information. Also, the underlying technology is still undergoing further development at an exceptionally fast pace. The rapid obsolescence of the technology required to read the information combined with the frequently imperceptible physical decay of the media themselves represents a serious threat to preservation of the information content. Many data sets in earth science research are from observations that cannot be repeated. This makes these digital assets particularly valuable. Therefore, these data should be kept and made available for re-use long after the end of the project from which they originated. Since research projects only run for a relatively short period of time, it is advisable to shift the burden of responsibility for long-term data curation from the individual researcher to a trusted data repository or archive. But what makes a trusted data repository? Each trusted digital repository has its own targets and specifications. The trustworthiness of digital repositories can be tested and assessed on the basis of a criteria catalogue. This is the main focus of the work of the nestor working group "Trusted repositories - Certification". It identifies criteria which permit the trustworthiness of a digital repository to be evaluated, both at the organisational and technical levels. The criteria are defined in close collaboration with a wide range of different memory organisations, producers of information, experts and other interested parties. This open approach ensures a high degree of universal validity, suitability for daily practical use and also broad-based acceptance of the results. The criteria catalogue is also intended to present the option of documenting trustworthiness by means of certification in a standardised national or international process. The criteria catalogue is based on the Reference Model for an Open Archival Information System (OAIS, ISO 14721:2003) With its broad approach, the nestor criteria catalogue for trusted digital repositories has to remain on a high level of abstraction. For application in the earth sciences the evaluation criteria need to be transferred into the context of earth science data and their designated user community. This presentation offers a brief introduction to the problems surrounding the long-term preservation of digital objects. This introduction is followed by a proposed application of the criteria catalogue for trusted digital repositories to the context of earth science data and their long-term preservation.
Reengineering Workflow for Curation of DICOM Datasets.

PubMed

Bennett, William; Smith, Kirk; Jarosz, Quasar; Nolan, Tracy; Bosch, Walter

2018-06-15

Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the "Posda Tools." The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools . In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using "nicknames" which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual.
Examining Data Repository Guidelines for Qualitative Data Sharing.

PubMed

Antes, Alison L; Walsh, Heidi A; Strait, Michelle; Hudson-Vitale, Cynthia R; DuBois, James M

2018-02-01

Qualitative data provide rich information on research questions in diverse fields. Recent calls for increased transparency and openness in research emphasize data sharing. However, qualitative data sharing has yet to become the norm internationally and is particularly uncommon in the United States. Guidance for archiving and secondary use of qualitative data is required for progress in this regard. In this study, we review the benefits and concerns associated with qualitative data sharing and then describe the results of a content analysis of guidelines from international repositories that archive qualitative data. A minority of repositories provide qualitative data sharing guidelines. Of the guidelines available, there is substantial variation in whether specific topics are addressed. Some topics, such as removing direct identifiers, are consistently addressed, while others, such as providing an anonymization log, are not. We discuss the implications of our study for education, best practices, and future research.
Avoidable Waste in Ophthalmic Epidemiology: A Review of Blindness Prevalence Surveys in Low and Middle Income Countries 2000-2014.

PubMed

Ramke, Jacqueline; Kuper, Hannah; Limburg, Hans; Kinloch, Jennifer; Zhu, Wenhui; Lansingh, Van C; Congdon, Nathan; Foster, Allen; Gilbert, Clare E

2018-02-01

Sources of avoidable waste in ophthalmic epidemiology include duplication of effort, and survey reports remaining unpublished, gaining publication after a long delay, or being incomplete or of poor quality. The aim of this review was to assess these sources of avoidable waste by examining blindness prevalence surveys undertaken in low and middle income countries (LMICs) between 2000 and 2014. On December 1, 2016 we searched MEDLINE, EMBASE and Web of Science databases for cross-sectional blindness prevalence surveys undertaken in LMICs between 2000 and 2014. All surveys listed on the Rapid Assessment of Avoidable Blindness (RAAB) Repository website ("the Repository") were also considered. For each survey we assessed (1) availability of scientific publication, survey report, summary results tables and/or datasets; (2) time to publication from year of survey completion and journal attributes; (3) extent of blindness information reported; and (4) rigour when information was available from two sources (i.e. whether it matched). Of the 279 included surveys (from 68 countries) 186 (67%) used RAAB methodology; 146 (52%) were published in a scientific journal, 57 (20%) were published in a journal and on the Repository, and 76 (27%) were on the Repository only (8% had tables; 19% had no information available beyond registration). Datasets were available for 50 RAABs (18% of included surveys). Time to publication ranged from <1 to 11 years (mean, standard deviation 2.8 ± 1.8 years). The extent of blindness information reported within studies varied (e.g. presenting and best-corrected, unilateral and bilateral); those with both a published report and Repository tables were most complete. For surveys published and with RAAB tables available, discrepancies were found in reporting of participant numbers (14% of studies) and blindness prevalence (15%). Strategies are needed to improve the availability, consistency, and quality of information reported from blindness prevalence surveys, and hence reduce avoidable waste.
Establishment and operation of a biorepository for molecular epidemiologic studies in Costa Rica.

PubMed

Cortés, Bernal; Schiffman, Mark; Herrero, Rolando; Hildesheim, Allan; Jiménez, Silvia; Shea, Katheryn; González, Paula; Porras, Carolina; Fallas, Greivin; Rodríguez, Ana Cecilia

2010-04-01

The Proyecto Epidemiológico Guanacaste (PEG) has conducted several large studies related to human papillomavirus (HPV) and cervical cancer in Guanacaste, Costa Rica in a long-standing collaboration with the U.S. National Cancer Institute. To improve molecular epidemiology efforts and save costs, we have gradually transferred technology to Costa Rica, culminating in state-of-the-art laboratories and a biorepository to support a phase III clinical trial investigating the efficacy of HPV 16/18 vaccine. Here, we describe the rationale and lessons learned in transferring molecular epidemiologic and biorepository technology to a developing country. At the outset of the PEG in the early 1990s, we shipped all specimens to repositories and laboratories in the United States, which created multiple problems. Since then, by intensive personal interactions between experts from the United States and Costa Rica, we have successfully transferred liquid-based cytology, HPV DNA testing and serology, chlamydia and gonorrhea testing, PCR-safe tissue processing, and viable cryopreservation. To accommodate the vaccine trial, a state-of-the-art repository opened in mid-2004. Approximately 15,000 to 50,000 samples are housed in the repository on any given day, and >500,000 specimens have been shipped, many using a custom-made dry shipper that permits exporting >20,000 specimens at a time. Quality control of shipments received by the NCI biorepository has revealed an error rate of <0.2%. Recently, the PEG repository has incorporated other activities; for example, large-scale aliquotting and long-term, cost-efficient storage of frozen specimens returned from the United States. Using Internet-based specimen tracking software has proven to be efficient even across borders. For long-standing collaborations, it makes sense to transfer the molecular epidemiology expertise toward the source of specimens. The successes of the PEG molecular epidemiology laboratories and biorepository prove that the physical and informatics infrastructures of a modern biorepository can be transferred to a resource-limited and weather-challenged region. Technology transfer is an important and feasible goal of international collaborations.
Overview of open resources to support automated structure verification and elucidation

EPA Science Inventory

Cheminformatics methods form an essential basis for providing analytical scientists with access to data, algorithms and workflows. There are an increasing number of free online databases (compound databases, spectral libraries, data repositories) and a rich collection of software...
Antifungal cyclic peptides from the marine sponge Microscleroderma herdmani

USDA-ARS?s Scientific Manuscript database

Screening natural product extracts from National Cancer Institute Open Repository for antifungal discovery afforded hits for bioassay-guided fractionation. Upon LC-MS analysis of column fractions with antifungal activities to generate information on chemical structure, two new cyclic hexapeptides, m...
COMODI: an ontology to characterise differences in versions of computational models in biology.

PubMed

Scharm, Martin; Waltemath, Dagmar; Mendes, Pedro; Wolkenhauer, Olaf

2016-07-11

Open model repositories provide ready-to-reuse computational models of biological systems. Models within those repositories evolve over time, leading to different model versions. Taken together, the underlying changes reflect a model's provenance and thus can give valuable insights into the studied biology. Currently, however, changes cannot be semantically interpreted. To improve this situation, we developed an ontology of terms describing changes in models. The ontology can be used by scientists and within software to characterise model updates at the level of single changes. When studying or reusing a model, these annotations help with determining the relevance of a change in a given context. We manually studied changes in selected models from BioModels and the Physiome Model Repository. Using the BiVeS tool for difference detection, we then performed an automatic analysis of changes in all models published in these repositories. The resulting set of concepts led us to define candidate terms for the ontology. In a final step, we aggregated and classified these terms and built the first version of the ontology. We present COMODI, an ontology needed because COmputational MOdels DIffer. It empowers users and software to describe changes in a model on the semantic level. COMODI also enables software to implement user-specific filter options for the display of model changes. Finally, COMODI is a step towards predicting how a change in a model influences the simulation results. COMODI, coupled with our algorithm for difference detection, ensures the transparency of a model's evolution, and it enhances the traceability of updates and error corrections. COMODI is encoded in OWL. It is openly available at http://comodi.sems.uni-rostock.de/ .
Current state of open access to journal publications from the University of Zagreb School of Medicine.

PubMed

Škorić, Lea; Vrkić, Dina; Petrak, Jelka

2016-02-01

To identify the share of open access (OA) papers in the total number of journal publications authored by the members of the University of Zagreb School of Medicine (UZSM) in 2014. Bibliographic data on 543 UZSM papers published in 2014 were collected using PubMed advanced search strategies and manual data collection methods. The items that had "free full text" icons were considered as gold OA papers. Their OA availability was checked using the provided link to full-text. The rest of the UZSM papers were analyzed for potential green OA through self-archiving in institutional repository. Papers published by Croatian journals were particularly analyzed. Full texts of approximately 65% of all UZSM papers were freely available. Most of them were published in gold OA journals (55% of all UZSM papers or 85% of all UZSM OA papers). In the UZSM repository, there were additional 52 freely available authors' manuscripts from subscription-based journals (10% of all UZSM papers or 15% of all UZSM OA papers). The overall proportion of OA in our study is higher than in similar studies, but only half of gold OA papers are accessible via PubMed directly. The results of our study indicate that increased quality of metadata and linking of the bibliographic records to full texts could assure better visibility. Moreover, only a quarter of papers from subscription-based journals that allow self-archiving are deposited in the UZSM repository. We believe that UZSM should consider mandating all faculty members to deposit their publications in UZSM OA repository to increase visibility and improve access to its scientific output.
Current state of open access to journal publications from the University of Zagreb School of Medicine

PubMed Central

Škorić, Lea; Vrkić, Dina; Petrak, Jelka

2016-01-01

Aims To identify the share of open access (OA) papers in the total number of journal publications authored by the members of the University of Zagreb School of Medicine (UZSM) in 2014. Methods Bibliographic data on 543 UZSM papers published in 2014 were collected using PubMed advanced search strategies and manual data collection methods. The items that had “free full text” icons were considered as gold OA papers. Their OA availability was checked using the provided link to full-text. The rest of the UZSM papers were analyzed for potential green OA through self-archiving in institutional repository. Papers published by Croatian journals were particularly analyzed. Results Full texts of approximately 65% of all UZSM papers were freely available. Most of them were published in gold OA journals (55% of all UZSM papers or 85% of all UZSM OA papers). In the UZSM repository, there were additional 52 freely available authors’ manuscripts from subscription-based journals (10% of all UZSM papers or 15% of all UZSM OA papers). Conclusion The overall proportion of OA in our study is higher than in similar studies, but only half of gold OA papers are accessible via PubMed directly. The results of our study indicate that increased quality of metadata and linking of the bibliographic records to full texts could assure better visibility. Moreover, only a quarter of papers from subscription-based journals that allow self-archiving are deposited in the UZSM repository. We believe that UZSM should consider mandating all faculty members to deposit their publications in UZSM OA repository to increase visibility and improve access to its scientific output. PMID:26935617

PubChem BioAssay: 2017 update

PubMed Central

Wang, Yanli; Bryant, Stephen H.; Cheng, Tiejun; Wang, Jiyao; Gindulyte, Asta; Shoemaker, Benjamin A.; Thiessen, Paul A.; He, Siqian; Zhang, Jian

2017-01-01

PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing. PMID:27899599
An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities

PubMed Central

Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R.; Ellis, Heidi JC; Hinman, M. Lee; Vyas, Jay; Gryk, Michael R.

2012-01-01

Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org). PMID:25328913
An Open-Source Sandbox for Increasing the Accessibility of Functional Programming to the Bioinformatics and Scientific Communities.

PubMed

Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R; Ellis, Heidi Jc; Hinman, M Lee; Vyas, Jay; Gryk, Michael R

2012-01-01

Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).
Environmental Information Management For Data Discovery and Access System

NASA Astrophysics Data System (ADS)

Giriprakash, P.

2011-01-01

Mercury is a federated metadata harvesting, search and retrieval tool based on both open source software and software developed at Oak Ridge National Laboratory. It was originally developed for NASA, and the Mercury development consortium now includes funding from NASA, USGS, and DOE. A major new version of Mercury was developed during 2007 and released in early 2008. This new version provides orders of magnitude improvements in search speed, support for additional metadata formats, integration with Google Maps for spatial queries, support for RSS delivery of search results, and ready customization to meet the needs of the multiple projects which use Mercury. For the end users, Mercury provides a single portal to very quickly search for data and information contained in disparate data management systems. It collects metadata and key data from contributing project servers distributed around the world and builds a centralized index. The Mercury search interfaces then allow ! the users to perform simple, fielded, spatial and temporal searches across these metadata sources. This centralized repository of metadata with distributed data sources provides extremely fast search results to the user, while allowing data providers to advertise the availability of their data and maintain complete control and ownership of that data.
An open data repository for steady state analysis of a 100-node electricity distribution network with moderate connection of renewable energy sources.

PubMed

Lazarou, Stavros; Vita, Vasiliki; Ekonomou, Lambros

2018-02-01

The data of this article represent a real electricity distribution network on twenty kilovolts (20 kV) at medium voltage level of the Hellenic electricity distribution system [1]. This network has been chosen as suitable for smart grid analysis. It demonstrates moderate penetration of renewable sources and it has capability in part of time for reverse power flows. It is suitable for studies of load aggregation, storage, demand response. It represents a rural line of fifty-five kilometres (55 km) total length, a typical length for this type. It serves forty-five (45) medium to low voltage transformers and twenty-four (24) connections to photovoltaic plants. The total installed load capacity is twelve mega-volt-ampere (12 MVA), however the maximum observed load is lower. The data are ready to perform load flow simulation on Matpower [2] for the maximum observed load power on the half production for renewables. The simulation results and processed data for creating the source code are also provided on the database available at http://dx.doi.org/10.7910/DVN/1I6MKU.
TTLEM: Open access tool for building numerically accurate landscape evolution models in MATLAB

NASA Astrophysics Data System (ADS)

Campforts, Benjamin; Schwanghart, Wolfgang; Govers, Gerard

2017-04-01

Despite a growing interest in LEMs, accuracy assessment of the numerical methods they are based on has received little attention. Here, we present TTLEM which is an open access landscape evolution package designed to develop and test your own scenarios and hypothesises. TTLEM uses a higher order flux-limiting finite-volume method to simulate river incision and tectonic displacement. We show that this scheme significantly influences the evolution of simulated landscapes and the spatial and temporal variability of erosion rates. Moreover, it allows the simulation of lateral tectonic displacement on a fixed grid. Through the use of a simple GUI the software produces visible output of evolving landscapes through model run time. In this contribution, we illustrate numerical landscape evolution through a set of movies spanning different spatial and temporal scales. We focus on the erosional domain and use both spatially constant and variable input values for uplift, lateral tectonic shortening, erodibility and precipitation. Moreover, we illustrate the relevance of a stochastic approach for realistic hillslope response modelling. TTLEM is a fully open source software package, written in MATLAB and based on the TopoToolbox platform (topotoolbox.wordpress.com). Installation instructions can be found on this website and the therefore designed GitHub repository.
pyhector: A Python interface for the simple climate model Hector

DOE Office of Scientific and Technical Information (OSTI.GOV)

N Willner, Sven; Hartin, Corinne; Gieseke, Robert

2017-04-01

Pyhector is a Python interface for the simple climate model Hector (Hartin et al. 2015) developed in C++. Simple climate models like Hector can, for instance, be used in the analysis of scenarios within integrated assessment models like GCAM1, in the emulation of complex climate models, and in uncertainty analyses. Hector is an open-source, object oriented, simple global climate carbon cycle model. Its carbon cycle consists of a one pool atmosphere, three terrestrial pools which can be broken down into finer biomes or regions, and four carbon pools in the ocean component. The terrestrial carbon cycle includes primary production andmore » respiration fluxes. The ocean carbon cycle circulates carbon via a simplified thermohaline circulation, calculating air-sea fluxes as well as the marine carbonate system (Hartin et al. 2016). The model input is time series of greenhouse gas emissions; as example scenarios for these the Pyhector package contains the Representative Concentration Pathways (RCPs)2. These were developed to cover the range of baseline and mitigation emissions scenarios and are widely used in climate change research and model intercomparison projects. Using DataFrames from the Python library Pandas (McKinney 2010) as a data structure for the scenarios simplifies generating and adapting scenarios. Other parameters of the Hector model can easily be modified when running the model. Pyhector can be installed using pip from the Python Package Index.3 Source code and issue tracker are available in Pyhector's GitHub repository4. Documentation is provided through Readthedocs5. Usage examples are also contained in the repository as a Jupyter Notebook (Pérez and Granger 2007; Kluyver et al. 2016). Courtesy of the Mybinder project6, the example Notebook can also be executed and modified without installing Pyhector locally.« less
Identification of New Molecular Entities (NMEs) as Potential Leads against Tuberculosis from Open Source Compound Repository.

PubMed

Kotapalli, Sudha Sravanti; Nallam, Sri Satya Anila; Nadella, Lavanya; Banerjee, Tanmay; Rode, Haridas B; Mainkar, Prathama S; Ummanni, Ramesh

2015-01-01

The purpose of this study was to provide a number of diverse and promising early-lead compounds that will feed into the drug discovery pipeline for developing new antitubercular agents. The results from the phenotypic screening of the open-source compound library against Mycobacterium smegmatis and Mycobacterium bovis (BCG) with hit validation against M. tuberculosis (H37Rv) have identified novel potent hit compounds. To determine their druglikeness, a systematic analysis of physicochemical properties of the hit compounds has been performed using cheminformatics tools. The hit molecules were analysed by clustering based on their chemical finger prints and structural similarity determining their chemical diversity. The hit compound library is also filtered for druglikeness based on the physicochemical descriptors following Lipinski filters. The robust filtration of hits followed by secondary screening against BCG, H37Rv and cytotoxicity evaluation has identified 12 compounds with potential against H37Rv (MIC range 0.4 to 12.5 μM). Furthermore in cytotoxicity assays, 12 compounds displayed low cytotoxicity against liver and lung cells providing high therapeutic index > 50. To avoid any variations in activity due to the route of chemical synthesis, the hit compounds were re synthesized independently and confirmed for their potential against H37Rv. Taken together, the hits reported here provides copious potential starting points for generation of new leads eventually adds to drug discovery pipeline against tuberculosis.
GRIP Collaboration Portal: Information Management for a Hurricane Field Campaign

NASA Astrophysics Data System (ADS)

Conover, H.; Kulkarni, A.; Garrett, M.; Smith, T.; Goodman, H. M.

2010-12-01

NASA’s Genesis and Rapid Intensification Processes (GRIP) experiment, carried out in August and September of 2010, was a complex operation, involving three aircraft and their crews based at different airports, a dozen instrument teams, mission scientists, weather forecasters, project coordinators and a variety of other participants. In addition, GRIP was coordinated with concurrent airborne missions: NOAA’s IFEX and then NSF-funded PREDICT. The GRIP Collaboration Portal was developed to facilitate communication within and between the different teams and serve as an information repository for the field campaign, providing a single access point for project documents, plans, weather forecasts, flight reports and quicklook data. The portal was developed using the Drupal open source content management framework. This presentation will cover both technology and participation issues. Specific examples include: Drupal’s large and diverse open source developer community is an advantage in that we were able to reuse many modules rather than develop capabilities from scratch, but integrating multiple modules developed by many people adds to the overall complexity of the site. Many of the communication capabilities provided by the site, such as discussion forums and blogs, were not used. Participants were diligent about posting necessary documents, but the favored communication method remained email. Drupal's developer-friendly nature allowed for quick development of the customized functionality needed to accommodate the rapidly changing requirements of GRIP experiment. DC-8 Overflight of Hurricane Earl during GRIP Mission
Childhood Vesicoureteral Reflux Studies: Registries and Repositories Sources and Nosology

PubMed Central

Chesney, Russell W.; Patters, Andrea B.

2012-01-01

Despite several recent studies, the advisability of antimicrobial prophylaxis and certain imaging studies for urinary tract infections (UTIs) remains controversial. The role of vesicoureteral reflux (VUR) on the severity and re-infection rates for UTIs is also difficult to assess. Registries and repositories of data and biomaterials from clinical studies in children with VUR are valuable. Disease registries are collections of secondary data related to patients with a specific diagnosis, condition or procedure. Registries differ from indices in that they contain more extensive data. A research repository is an entity that receives, stores, processes and/or disseminates specimens (or other materials) as needed. It encompasses the physical location as well as the full range of activities associated with its operation. It may also be referred to as a biorepository. This report provides information about some current registries and repositories that include data and samples from children with VUR. It also describes the heterogeneous nature of the subjects, as some registries and repositories include only data or samples from patients with primary reflux while others also include those from patients with syndromic or secondary reflux. PMID:23044377
Potential benefits of waste transmutation to the U.S. high-level waste respository

DOE Office of Scientific and Technical Information (OSTI.GOV)

Michaels, G.E.

1995-10-01

This paper reexamines the potential benefits of waste transmutation to the proposed U.S. geologic repository at the Yucca Mountain site based on recent progress in the performance assessment for the Yucca Mountain base case of spent fuel emplacement. It is observed that actinides are assumed to have higher solubility than in previous studies and that Np and other actinides now dominate the projected aqueous releases from a Yucca Mountain repository. Actinides are also indentified as the dominant source of decay heat in the repository, and the effect of decay heat in perturbing the hydrology, geochemistry, and thermal characteristics of Yuccamore » Mountain are reviewed. It is concluded that the potential for thermally-driven, buoyant, gas-phase flow at Yucca Mountain introduces data and modeling requirements that will increase the costs of licensing the site and may cause the site to be unattractive for geologic disposal of wastes. A transmutation-enabled cold repository is proposed that might allow licensing of a repository to be based upon currently observable characteristics of the Yucca Mountain site.« less
SU-E-T-103: Development and Implementation of Web Based Quality Control Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studinski, R; Taylor, R; Angers, C

Purpose: Historically many radiation medicine programs have maintained their Quality Control (QC) test results in paper records or Microsoft Excel worksheets. Both these approaches represent significant logistical challenges, and are not predisposed to data review and approval. It has been our group's aim to develop and implement web based software designed not just to record and store QC data in a centralized database, but to provide scheduling and data review tools to help manage a radiation therapy clinics Equipment Quality control program. Methods: The software was written in the Python programming language using the Django web framework. In order tomore » promote collaboration and validation from other centres the code was made open source and is freely available to the public via an online source code repository. The code was written to provide a common user interface for data entry, formalize the review and approval process, and offer automated data trending and process control analysis of test results. Results: As of February 2014, our installation of QAtrack+ has 180 tests defined in its database and has collected ∼22 000 test results, all of which have been reviewed and approved by a physicist via QATrack+'s review tools. These results include records for quality control of Elekta accelerators, CT simulators, our brachytherapy programme, TomoTherapy and Cyberknife units. Currently at least 5 other centres are known to be running QAtrack+ clinically, forming the start of an international user community. Conclusion: QAtrack+ has proven to be an effective tool for collecting radiation therapy QC data, allowing for rapid review and trending of data for a wide variety of treatment units. As free and open source software, all source code, documentation and a bug tracker are available to the public at https://bitbucket.org/tohccmedphys/qatrackplus/.« less
Retrospective checking of compliance with practice guidelines for acute stroke care: a novel experiment using openEHR’s Guideline Definition Language

PubMed Central

2014-01-01

Background Providing scalable clinical decision support (CDS) across institutions that use different electronic health record (EHR) systems has been a challenge for medical informatics researchers. The lack of commonly shared EHR models and terminology bindings has been recognised as a major barrier to sharing CDS content among different organisations. The openEHR Guideline Definition Language (GDL) expresses CDS content based on openEHR archetypes and can support any clinical terminologies or natural languages. Our aim was to explore in an experimental setting the practicability of GDL and its underlying archetype formalism. A further aim was to report on the artefacts produced by this new technological approach in this particular experiment. We modelled and automatically executed compliance checking rules from clinical practice guidelines for acute stroke care. Methods We extracted rules from the European clinical practice guidelines as well as from treatment contraindications for acute stroke care and represented them using GDL. Then we executed the rules retrospectively on 49 mock patient cases to check the cases’ compliance with the guidelines, and manually validated the execution results. We used openEHR archetypes, GDL rules, the openEHR reference information model, reference terminologies and the Data Archetype Definition Language. We utilised the open-sourced GDL Editor for authoring GDL rules, the international archetype repository for reusing archetypes, the open-sourced Ocean Archetype Editor for authoring or modifying archetypes and the CDS Workbench for executing GDL rules on patient data. Results We successfully represented clinical rules about 14 out of 19 contraindications for thrombolysis and other aspects of acute stroke care with 80 GDL rules. These rules are based on 14 reused international archetypes (one of which was modified), 2 newly created archetypes and 51 terminology bindings (to three terminologies). Our manual compliance checks for 49 mock patients were a complete match versus the automated compliance results. Conclusions Shareable guideline knowledge for use in automated retrospective checking of guideline compliance may be achievable using GDL. Whether the same GDL rules can be used for at-the-point-of-care CDS remains unknown. PMID:24886468
TraitBank: An Open Digital Repository for Organism Traits

USDA-ARS?s Scientific Manuscript database

TraitBank currently serves over 11 million measurements and facts for more than 1.7 million taxa. These data are mobilized from major biodiversity information systems (e.g., International Union for Conservation of Nature, Ocean Biogeographic Information System, Paleobiology Database), literature sup...
Publishing descriptions of non-public clinical datasets: proposed guidance for researchers, repositories, editors and funding organisations.

PubMed

Hrynaszkiewicz, Iain; Khodiyar, Varsha; Hufton, Andrew L; Sansone, Susanna-Assunta

2016-01-01

Sharing of experimental clinical research data usually happens between individuals or research groups rather than via public repositories, in part due to the need to protect research participant privacy. This approach to data sharing makes it difficult to connect journal articles with their underlying datasets and is often insufficient for ensuring access to data in the long term. Voluntary data sharing services such as the Yale Open Data Access (YODA) and Clinical Study Data Request (CSDR) projects have increased accessibility to clinical datasets for secondary uses while protecting patient privacy and the legitimacy of secondary analyses but these resources are generally disconnected from journal articles-where researchers typically search for reliable information to inform future research. New scholarly journal and article types dedicated to increasing accessibility of research data have emerged in recent years and, in general, journals are developing stronger links with data repositories. There is a need for increased collaboration between journals, data repositories, researchers, funders, and voluntary data sharing services to increase the visibility and reliability of clinical research. Using the journal Scientific Data as a case study, we propose and show examples of changes to the format and peer-review process for journal articles to more robustly link them to data that are only available on request. We also propose additional features for data repositories to better accommodate non-public clinical datasets, including Data Use Agreements (DUAs).
Geoengineering properties of potential repository units at Yucca Mountain, southern Nevada

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tillerson, J.R.; Nimick, F.B.

1984-12-01

The Nevada Nuclear Waste Storage Investigations (NNWSI) Project is currently evaluating volcanic tuffs at the Yucca Mountain site, located on and adjacent to the Nevada Test Site, for possible use as a host rock for a radioactive waste repository. The behavior of tuff as an engineering material must be understood to design, license, construct, and operate a repository. Geoengineering evaluations and measurements are being made to develop confidence in both the analysis techniques for thermal, mechanical, and hydrothermal effects and the supporting data base of rock properties. The analysis techniques and the data base are currently used for repository design,more » waste package design, and performance assessment analyses. This report documents the data base of geoengineering properties used in the analyses that aided the selection of the waste emplacement horizon and in analyses synopsized in the Environmental Assessment Report prepared for the Yucca Mountain site. The strategy used for the development of the data base relies primarily on data obtained in laboratory tests that are then confirmed in field tests. Average thermal and mechanical properties (and their anticipated variations) are presented. Based upon these data, analyses completed to date, and previous excavation experience in tuff, it is anticipated that existing mining technology can be used to develop stable underground openings and that repository operations can be carried out safely.« less
Accessing and integrating data and knowledge for biomedical research.

PubMed

Burgun, A; Bodenreider, O

2008-01-01

To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research.
SPECIATE Version 4.4 Database Development Documentation

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Some of the many uses of these source profiles include: (1) creating speciated emissions inventories for regi...
SPECIATE 4.2: speciation Database Development Documentation

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...
Canadian Open Genetics Repository (COGR): a unified clinical genomics database as a community resource for standardising and sharing genetic interpretations.

PubMed

Lerner-Ellis, Jordan; Wang, Marina; White, Shana; Lebo, Matthew S

2015-07-01

The Canadian Open Genetics Repository is a collaborative effort for the collection, storage, sharing and robust analysis of variants reported by medical diagnostics laboratories across Canada. As clinical laboratories adopt modern genomics technologies, the need for this type of collaborative framework is increasingly important. A survey to assess existing protocols for variant classification and reporting was delivered to clinical genetics laboratories across Canada. Based on feedback from this survey, a variant assessment tool was made available to all laboratories. Each participating laboratory was provided with an instance of GeneInsight, a software featuring versioning and approval processes for variant assessments and interpretations and allowing for variant data to be shared between instances. Guidelines were established for sharing data among clinical laboratories and in the final outreach phase, data will be made readily available to patient advocacy groups for general use. The survey demonstrated the need for improved standardisation and data sharing across the country. A variant assessment template was made available to the community to aid with standardisation. Instances of the GeneInsight tool were provided to clinical diagnostic laboratories across Canada for the purpose of uploading, transferring, accessing and sharing variant data. As an ongoing endeavour and a permanent resource, the Canadian Open Genetics Repository aims to serve as a focal point for the collaboration of Canadian laboratories with other countries in the development of tools that take full advantage of laboratory data in diagnosing, managing and treating genetic diseases. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.

Utilizing online resources for taxonomy: a cybercatalog of Afrotropical apiocerid flies (Insecta: Diptera: Apioceridae).

PubMed

Dikow, Torsten; Agosti, Donat

2015-01-01

A cybercatalog to the Apioceridae (apiocerid flies) of the Afrotropical Region is provided. Each taxon entry includes links to open-access, online repositories such as ZooBank, BHL/BioStor/BLR, Plazi, GBIF, Morphbank, EoL, and a research web-site to access taxonomic information, digitized literature, morphological descriptions, specimen occurrence data, and images. Cybercatalogs as the one presented here will need to become the future of taxonomic catalogs taking advantage of the growing number of online repositories, linked data, and be easily updatable. Comments on the deposition of the holotype of Apiocera braunsi Melander, 1907 are made.
Data2Paper: A stakeholder-driven solution to data publication and citation challenges

NASA Astrophysics Data System (ADS)

Murphy, Fiona; Jefferies, Neil; Ingraham, Thomas; Murray, Hollydawn; Ranganathan, Anusha

2017-04-01

Data, and especially open data, are valuable to the community but can also be valuable to the researcher. Data papers are a clear and open way to publicize and contextualize your data in a way that is citable and aids both reproducibility and efficiency in scholarly endeavour. However, this is not yet a format that is well understood or proliferating amongst the mainstream research community. Part of the Jisc Data Spring Initiative, a team of stakeholders (publishers, data repository managers, coders) have been developing a simple 'one-click' process where data, metadata and methods detail are transferred from a data repository (via a SWORD-based API and a cloud-based helper app based on the Fedora/Hydra platform) to a relevant publisher platform for publication as a data paper. Relying on automated processes: using ORCIDs to authenticate and pre-populate article templates and building on the DOI infrastructure to encourage provenance and citation, the app seeks to drive the deposit of data in repositories and encourage the growth of data papers by simplifying the process through the removal of redundant metadata entry and streamlining publisher submissions into a single consistent workflow. This poster will explain the underlying rationale and evidence gathering, development, partnerships, governance and other progress that this project has so far achieved. It will outline some key learning opportunities, challenges and drivers and explore the next steps.
UManSysProp v1.0: an online and open-source facility for molecular property prediction and atmospheric aerosol calculations

NASA Astrophysics Data System (ADS)

Topping, David; Barley, Mark; Bane, Michael K.; Higham, Nicholas; Aumont, Bernard; Dingle, Nicholas; McFiggans, Gordon

2016-03-01

In this paper we describe the development and application of a new web-based facility, UManSysProp (http://umansysprop.seaes.manchester.ac.uk), for automating predictions of molecular and atmospheric aerosol properties. Current facilities include pure component vapour pressures, critical properties, and sub-cooled densities of organic molecules; activity coefficient predictions for mixed inorganic-organic liquid systems; hygroscopic growth factors and CCN (cloud condensation nuclei) activation potential of mixed inorganic-organic aerosol particles; and absorptive partitioning calculations with/without a treatment of non-ideality. The aim of this new facility is to provide a single point of reference for all properties relevant to atmospheric aerosol that have been checked for applicability to atmospheric compounds where possible. The group contribution approach allows users to upload molecular information in the form of SMILES (Simplified Molecular Input Line Entry System) strings and UManSysProp will automatically extract the relevant information for calculations. Built using open-source chemical informatics, and hosted at the University of Manchester, the facilities are provided via a browser and device-friendly web interface, or can be accessed using the user's own code via a JSON API (application program interface). We also provide the source code for all predictive techniques provided on the site, covered by the GNU GPL (General Public License) license to encourage development of a user community. We have released this via a Github repository (doi:10.5281/zenodo.45143). In this paper we demonstrate its use with specific examples that can be simulated using the web-browser interface.
SNOMED CT module-driven clinical archetype management.

PubMed

Allones, J L; Taboada, M; Martinez, D; Lozano, R; Sobrido, M J

2013-06-01

To explore semantic search to improve management and user navigation in clinical archetype repositories. In order to support semantic searches across archetypes, an automated method based on SNOMED CT modularization is implemented to transform clinical archetypes into SNOMED CT extracts. Concurrently, query terms are converted into SNOMED CT concepts using the search engine Lucene. Retrieval is then carried out by matching query concepts with the corresponding SNOMED CT segments. A test collection of the 16 clinical archetypes, including over 250 terms, and a subset of 55 clinical terms from two medical dictionaries, MediLexicon and MedlinePlus, were used to test our method. The keyword-based service supported by the OpenEHR repository offered us a benchmark to evaluate the enhancement of performance. In total, our approach reached 97.4% precision and 69.1% recall, providing a substantial improvement of recall (more than 70%) compared to the benchmark. Exploiting medical domain knowledge from ontologies such as SNOMED CT may overcome some limitations of the keyword-based systems and thus improve the search experience of repository users. An automated approach based on ontology segmentation is an efficient and feasible way for supporting modeling, management and user navigation in clinical archetype repositories. Copyright © 2013 Elsevier Inc. All rights reserved.
A Computational Workflow for the Automated Generation of Models of Genetic Designs.

PubMed

Misirli, Göksel; Nguyen, Tramy; McLaughlin, James Alastair; Vaidyanathan, Prashant; Jones, Timothy S; Densmore, Douglas; Myers, Chris; Wipat, Anil

2018-06-05

Computational models are essential to engineer predictable biological systems and to scale up this process for complex systems. Computational modeling often requires expert knowledge and data to build models. Clearly, manual creation of models is not scalable for large designs. Despite several automated model construction approaches, computational methodologies to bridge knowledge in design repositories and the process of creating computational models have still not been established. This paper describes a workflow for automatic generation of computational models of genetic circuits from data stored in design repositories using existing standards. This workflow leverages the software tool SBOLDesigner to build structural models that are then enriched by the Virtual Parts Repository API using Systems Biology Open Language (SBOL) data fetched from the SynBioHub design repository. The iBioSim software tool is then utilized to convert this SBOL description into a computational model encoded using the Systems Biology Markup Language (SBML). Finally, this SBML model can be simulated using a variety of methods. This workflow provides synthetic biologists with easy to use tools to create predictable biological systems, hiding away the complexity of building computational models. This approach can further be incorporated into other computational workflows for design automation.
10 CFR 963.2 - Definitions.

Code of Federal Regulations, 2011 CFR

2011-01-01

... the sealing of shafts and ramps, except those openings that may be designed for ventilation or... repository, as defined by this section, at the Yucca Mountain site. Design means a description of the... includes the engineered barrier system. Design bases means that information that identifies the specific...
10 CFR 963.2 - Definitions.

Code of Federal Regulations, 2012 CFR

2012-01-01

... the sealing of shafts and ramps, except those openings that may be designed for ventilation or... repository, as defined by this section, at the Yucca Mountain site. Design means a description of the... includes the engineered barrier system. Design bases means that information that identifies the specific...
17 CFR 37.1000 - Core Principle 10-Recordkeeping and reporting.

Code of Federal Regulations, 2014 CFR

2014-04-01

... records relating to swaps defined in section 1a(47)(A)(v) of the Act open to inspection and examination by the Securities and Exchange Commission. (b) Requirements. The Commission shall adopt data collection... requirements for derivatives clearing organizations and swap data repositories. ...
10 CFR 963.2 - Definitions.

Code of Federal Regulations, 2013 CFR

2013-01-01

... the sealing of shafts and ramps, except those openings that may be designed for ventilation or... repository, as defined by this section, at the Yucca Mountain site. Design means a description of the... includes the engineered barrier system. Design bases means that information that identifies the specific...
10 CFR 963.2 - Definitions.

Code of Federal Regulations, 2014 CFR

2014-01-01

... the sealing of shafts and ramps, except those openings that may be designed for ventilation or... repository, as defined by this section, at the Yucca Mountain site. Design means a description of the... includes the engineered barrier system. Design bases means that information that identifies the specific...
10 CFR 963.2 - Definitions.

Code of Federal Regulations, 2010 CFR

2010-01-01

... the sealing of shafts and ramps, except those openings that may be designed for ventilation or... repository, as defined by this section, at the Yucca Mountain site. Design means a description of the... includes the engineered barrier system. Design bases means that information that identifies the specific...
A Walk through TRIDEC's intermediate Tsunami Early Warning System

NASA Astrophysics Data System (ADS)

Hammitzsch, M.; Reißland, S.; Lendholt, M.

2012-04-01

The management of natural crises is an important application field of the technology developed in the project Collaborative, Complex, and Critical Decision-Support in Evolving Crises (TRIDEC), co-funded by the European Commission in its Seventh Framework Programme. TRIDEC is based on the development of the German Indonesian Tsunami Early Warning System (GITEWS) and the Distant Early Warning System (DEWS) providing a service platform for both sensor integration and warning dissemination. In TRIDEC new developments in Information and Communication Technology (ICT) are used to extend the existing platform realising a component-based technology framework for building distributed tsunami warning systems for deployment, e.g. in the North-eastern Atlantic, the Mediterranean and Connected Seas (NEAM) region. The TRIDEC system will be implemented in three phases, each with a demonstrator. Successively, the demonstrators are addressing challenges, such as the design and implementation of a robust and scalable service infrastructure supporting the integration and utilisation of existing resources with accelerated generation of large volumes of data. These include sensor systems, geo-information repositories, simulation tools and data fusion tools. In addition to conventional sensors also unconventional sensors and sensor networks play an important role in TRIDEC. The system version presented is based on service-oriented architecture (SOA) concepts and on relevant standards of the Open Geospatial Consortium (OGC), the World Wide Web Consortium (W3C) and the Organization for the Advancement of Structured Information Standards (OASIS). In this way the system continuously gathers, processes and displays events and data coming from open sensor platforms to enable operators to quickly decide whether an early warning is necessary and to send personalized warning messages to the authorities and the population at large through a wide range of communication channels. The system integrates OGC Sensor Web Enablement (SWE) compliant sensor systems for the rapid detection of hazardous events, like earthquakes, sea level anomalies, ocean floor occurrences, and ground displacements. Using OGC Web Map Service (WMS) and Web Feature Service (WFS) spatial data are utilized to depict the situation picture. The integration of a simulation system to identify affected areas is considered using the OGC Web Processing Service (WPS). Warning messages are compiled and transmitted in the OASIS Common Alerting Protocol (CAP) together with addressing information defined via the OASIS Emergency Data Exchange Language - Distribution Element (EDXL-DE). The first system demonstrator has been designed and implemented to support plausible scenarios demonstrating the treatment of simulated tsunami threats with an essential subset of a National Tsunami Warning Centre (NTWC). The feasibility and the potentials of the implemented approach are demonstrated covering standard operations as well as tsunami detection and alerting functions. The demonstrator presented addresses information management and decision-support processes in a hypothetical natural crisis situation caused by a tsunami in the Eastern Mediterranean. Developments of the system are based to the largest extent on free and open source software (FOSS) components and industry standards. Emphasis has been and will be made on leveraging open source technologies that support mature system architecture models wherever appropriate. All open source software produced is foreseen to be published on a publicly available software repository thus allowing others to reuse results achieved and enabling further development and collaboration with a wide community including scientists, developers, users and stakeholders. This live demonstration is linked with the talk "TRIDEC Natural Crisis Management Demonstrator for Tsunamis" (EGU2012-7275) given in the session "Architecture of Future Tsunami Warning Systems" (NH5.7/ESSI1.7).
SPECIATE 4.4: The Bridge Between Emissions Characterization and Modeling

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Some of the many uses of these source profiles include: (1) creating speciated emissions inventories for...
The Development and Uses of EPA's SPECIATE Database

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic compounds (VOC) and particulate matter (PM) speciation profiles of air pollution sources. These source profiles can be used to (l) provide input to chemical mass balance (CMB) receptor mod...
Research on Geo-information Data Model for Preselected Areas of Geological Disposal of High-level Radioactive Waste

NASA Astrophysics Data System (ADS)

Gao, M.; Huang, S. T.; Wang, P.; Zhao, Y. A.; Wang, H. B.

2016-11-01

The geological disposal of high-level radioactive waste (hereinafter referred to "geological disposal") is a long-term, complex, and systematic scientific project, whose data and information resources in the research and development ((hereinafter referred to ”R&D”) process provide the significant support for R&D of geological disposal system, and lay a foundation for the long-term stability and safety assessment of repository site. However, the data related to the research and engineering in the sitting of the geological disposal repositories is more complicated (including multi-source, multi-dimension and changeable), the requirements for the data accuracy and comprehensive application has become much higher than before, which lead to the fact that the data model design of geo-information database for the disposal repository are facing more serious challenges. In the essay, data resources of the pre-selected areas of the repository has been comprehensive controlled and systematic analyzed. According to deeply understanding of the application requirements, the research work has made a solution for the key technical problems including reasonable classification system of multi-source data entity, complex logic relations and effective physical storage structures. The new solution has broken through data classification and conventional spatial data the organization model applied in the traditional industry, realized the data organization and integration with the unit of data entities and spatial relationship, which were independent, holonomic and with application significant features in HLW geological disposal. The reasonable, feasible and flexible data conceptual models, logical models and physical models have been established so as to ensure the effective integration and facilitate application development of multi-source data in pre-selected areas for geological disposal.
Validation of the openEHR archetype library by using OWL reasoning.

PubMed

Menárguez-Tortosa, Marcos; Fernández-Breis, Jesualdo Tomás

2011-01-01

Electronic Health Record architectures based on the dual model architecture use archetypes for representing clinical knowledge. Therefore, ensuring their correctness and consistency is a fundamental research goal. In this work, we explore how an approach based on OWL technologies can be used for such purpose. This method has been applied to the openEHR archetype repository, which is the largest available one nowadays. The results of this validation are also reported in this study.
Thermal Analysis of a Nuclear Waste Repository in Argillite Host Rock

NASA Astrophysics Data System (ADS)

Hadgu, T.; Gomez, S. P.; Matteo, E. N.

2017-12-01

Disposal of high-level nuclear waste in a geological repository requires analysis of heat distribution as a result of decay heat. Such an analysis supports design of repository layout to define repository footprint as well as provide information of importance to overall design. The analysis is also used in the study of potential migration of radionuclides to the accessible environment. In this study, thermal analysis for high-level waste and spent nuclear fuel in a generic repository in argillite host rock is presented. The thermal analysis utilized both semi-analytical and numerical modeling in the near field of a repository. The semi-analytical method looks at heat transport by conduction in the repository and surroundings. The results of the simulation method are temperature histories at selected radial distances from the waste package. A 3-D thermal-hydrologic numerical model was also conducted to study fluid and heat distribution in the near field. The thermal analysis assumed a generic geological repository at 500 m depth. For the semi-analytical method, a backfilled closed repository was assumed with basic design and material properties. For the thermal-hydrologic numerical method, a repository layout with disposal in horizontal boreholes was assumed. The 3-D modeling domain covers a limited portion of the repository footprint to enable a detailed thermal analysis. A highly refined unstructured mesh was used with increased discretization near heat sources and at intersections of different materials. All simulations considered different parameter values for properties of components of the engineered barrier system (i.e. buffer, disturbed rock zone and the host rock), and different surface storage times. Results of the different modeling cases are presented and include temperature and fluid flow profiles in the near field at different simulation times. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. SAND2017-8295 A.
The Challenges of OER to Academic Practice

ERIC Educational Resources Information Center

Browne, Tom; Holding, Richard; Howell, Anna; Rodway-Dyer, Sue

2010-01-01

The degree to which Open Educational Resources (OER) reflect the values of its institutional provider depends on questions of economics and the level of support amongst its academics. For project managers establishing OER repositories, the latter question--how to cultivate, nurture and maintain academic engagement--is critical. Whilst…
High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bouchard, Kristofer E.; Aimone, James B.; Chun, Miyoung

A lack of coherent plans to analyze, manage, and understand data threatens the various opportunities offered by new neuro-technologies. High-performance computing will allow exploratory analysis of massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.
High-Performance Computing in Neuroscience for Data-Driven Discovery, Integration, and Dissemination

DOE PAGES

Bouchard, Kristofer E.; Aimone, James B.; Chun, Miyoung; ...

2016-11-01

A lack of coherent plans to analyze, manage, and understand data threatens the various opportunities offered by new neuro-technologies. High-performance computing will allow exploratory analysis of massive datasets stored in standardized formats, hosted in open repositories, and integrated with simulations.

Open Access to Mexican Academic Production

ERIC Educational Resources Information Center

Adame, Silvia I.; Llorens, Luis

2016-01-01

This paper presents a description of the metadata harvester software development. This system provides access to reliable and quality educational resources, shared by Mexican Universities through their repositories, to anyone with Internet Access. We present the conceptual and contextual framework, followed by the technical basis, the results and…
The Contextualization of Archetypes: Clinical Template Governance.

PubMed

Pedersen, Rune; Ulriksen, Gro-Hilde; Ellingsen, Gunnar

2015-01-01

This paper is a status report from a large-scale openEHR-based EPR project from the North Norway Regional Health Authority. It concerns the standardization of a regional ICT portfolio and the ongoing development of a new process oriented EPR systems encouraged by the unfolding of a national repository for openEHR archetypes. Subject of interest; the contextualization of clinical templates is governed over multiple national boundaries which is complex due to the dependency of clinical resources. From the outset of this, we are interested in how local, regional, and national organizers maneuver to standardize while applying OpenEHR technology.
Simulation of fluid flow and energy transport processes associated with high-level radioactive waste disposal in unsaturated alluvium

USGS Publications Warehouse

Pollock, David W.

1986-01-01

Many parts of the Great Basin have thick zones of unsaturated alluvium which might be suitable for disposing of high-level radioactive wastes. A mathematical model accounting for the coupled transport of energy, water (vapor and liquid), and dry air was used to analyze one-dimensional, vertical transport above and below an areally extensive repository. Numerical simulations were conducted for a hypothetical repository containing spent nuclear fuel and located 100 m below land surface. Initial steady state downward water fluxes of zero (hydrostatic) and 0.0003 m yr−1were considered in an attempt to bracket the likely range in natural water flux. Predicted temperatures within the repository peaked after approximately 50 years and declined slowly thereafter in response to the decreasing intensity of the radioactive heat source. The alluvium near the repository experienced a cycle of drying and rewetting in both cases. The extent of the dry zone was strongly controlled by the mobility of liquid water near the repository under natural conditions. In the case of initial hydrostatic conditions, the dry zone extended approximately 10 m above and 15 m below the repository. For the case of a natural flux of 0.0003 m yr−1 the relative permeability of water near the repository was initially more than 30 times the value under hydrostatic conditions, consequently the dry zone extended only about 2 m above and 5 m below the repository. In both cases a significant perturbation in liquid saturation levels persisted for several hundred years. This analysis illustrates the extreme sensitivity of model predictions to initial conditions and parameters, such as relative permeability and moisture characteristic curves, that are often poorly known.
Sustaining an Online, Shared Community Resource for Models, Robust Open source Software Tools and Data for Volcanology - the Vhub Experience

NASA Astrophysics Data System (ADS)

Patra, A. K.; Valentine, G. A.; Bursik, M. I.; Connor, C.; Connor, L.; Jones, M.; Simakov, N.; Aghakhani, H.; Jones-Ivey, R.; Kosar, T.; Zhang, B.

2015-12-01

Over the last 5 years we have created a community collaboratory Vhub.org [Palma et al, J. App. Volc. 3:2 doi:10.1186/2191-5040-3-2] as a place to find volcanology-related resources, and a venue for users to disseminate tools, teaching resources, data, and an online platform to support collaborative efforts. As the community (current active users > 6000 from an estimated community of comparable size) embeds the tools in the collaboratory into educational and research workflows it became imperative to: a) redesign tools into robust, open source reusable software for online and offline usage/enhancement; b) share large datasets with remote collaborators and other users seamlessly with security; c) support complex workflows for uncertainty analysis, validation and verification and data assimilation with large data. The focus on tool development/redevelopment has been twofold - firstly to use best practices in software engineering and new hardware like multi-core and graphic processing units. Secondly we wish to enhance capabilities to support inverse modeling, uncertainty quantification using large ensembles and design of experiments, calibration, validation. Among software engineering practices we practice are open source facilitating community contributions, modularity and reusability. Our initial targets are four popular tools on Vhub - TITAN2D, TEPHRA2, PUFF and LAVA. Use of tools like these requires many observation driven data sets e.g. digital elevation models of topography, satellite imagery, field observations on deposits etc. These data are often maintained in private repositories that are privately shared by "sneaker-net". As a partial solution to this we tested mechanisms using irods software for online sharing of private data with public metadata and access limits. Finally, we adapted use of workflow engines (e.g. Pegasus) to support the complex data and computing workflows needed for usage like uncertainty quantification for hazard analysis using physical models.
CEBS object model for systems biology data, SysBio-OM.

PubMed

Xirasagar, Sandhya; Gustafson, Scott; Merrick, B Alex; Tomer, Kenneth B; Stasiewicz, Stanley; Chan, Denny D; Yost, Kenneth J; Yates, John R; Sumner, Susan; Xiao, Nianqing; Waters, Michael D

2004-09-01

To promote a systems biology approach to understanding the biological effects of environmental stressors, the Chemical Effects in Biological Systems (CEBS) knowledge base is being developed to house data from multiple complex data streams in a systems friendly manner that will accommodate extensive querying from users. Unified data representation via a single object model will greatly aid in integrating data storage and management, and facilitate reuse of software to analyze and display data resulting from diverse differential expression or differential profile technologies. Data streams include, but are not limited to, gene expression analysis (transcriptomics), protein expression and protein-protein interaction analysis (proteomics) and changes in low molecular weight metabolite levels (metabolomics). To enable the integration of microarray gene expression, proteomics and metabolomics data in the CEBS system, we designed an object model, Systems Biology Object Model (SysBio-OM). The model is comprehensive and leverages other open source efforts, namely the MicroArray Gene Expression Object Model (MAGE-OM) and the Proteomics Experiment Data Repository (PEDRo) object model. SysBio-OM is designed by extending MAGE-OM to represent protein expression data elements (including those from PEDRo), protein-protein interaction and metabolomics data. SysBio-OM promotes the standardization of data representation and data quality by facilitating the capture of the minimum annotation required for an experiment. Such standardization refines the accuracy of data mining and interpretation. The open source SysBio-OM model, which can be implemented on varied computing platforms is presented here. A universal modeling language depiction of the entire SysBio-OM is available at http://cebs.niehs.nih.gov/SysBioOM/. The Rational Rose object model package is distributed under an open source license that permits unrestricted academic and commercial use and is available at http://cebs.niehs.nih.gov/cebsdownloads. The database and interface are being built to implement the model and will be available for public use at http://cebs.niehs.nih.gov.
National Geothermal Data System (USA): an Exemplar of Open Access to Data

NASA Astrophysics Data System (ADS)

Allison, M. Lee; Richard, Stephen; Blackman, Harold; Anderson, Arlene; Patten, Kim

2014-05-01

The National Geothermal Data System's (NGDS - www.geothermaldata.org) formal launch in April, 2014 will provide open access to millions of data records, sharing -relevant geoscience and longer term to land use data to propel geothermal development and production. NGDS serves information from all of the U.S. Department of Energy's sponsored development and research projects and geologic data from all 50 states, using free and open source software. This interactive online system is opening new exploration opportunities and potentially shortening project development by making data easily discoverable, accessible, and interoperable. We continue to populate our prototype functional data system with multiple data nodes and nationwide data online and available to the public. Data from state geological surveys and partners includes more than 6 million records online, including 1.72 million well headers (oil and gas, water, geothermal), 670,000 well logs, and 497,000 borehole temperatures and is growing rapidly. There are over 312 interoperable Web services and another 106 WMS (Web Map Services) registered in the system as of January, 2014. Companion projects run by Southern Methodist University and U.S. Geological Survey (USGS) are adding millions of additional data records. The DOE Geothermal Data Repository, currently hosted on OpenEI, is a system node and clearinghouse for data from hundreds of U.S. DOE-funded geothermal projects. NGDS is built on the US Geoscience Information Network (USGIN) data integration framework, which is a joint undertaking of the USGS and the Association of American State Geologists (AASG). NGDS complies with the White House Executive Order of May 2013, requiring all federal agencies to make their data holdings publicly accessible online in open source, interoperable formats with common core and extensible metadata. The National Geothermal Data System is being designed, built, deployed, and populated primarily with support from the US Department of Energy, Geothermal Technologies Office. To keep this system operational after the original implementation will require four core elements: continued serving of data and applications by providers; maintenance of system operations; a governance structure; and an effective business model. Each of these presents a number of challenges currently under consideration.
Rolling Deck to Repository I: Designing a Database Infrastructure

NASA Astrophysics Data System (ADS)

Arko, R. A.; Miller, S. P.; Chandler, C. L.; Ferrini, V. L.; O'Hara, S. H.

2008-12-01

The NSF-supported academic research fleet collectively produces a large and diverse volume of scientific data, which are increasingly being shared across disciplines and contributed to regional and global syntheses. As both Internet connectivity and storage technology improve, it becomes practical for ships to routinely deliver data and documentation for a standard suite of underway instruments to a central shoreside repository. Routine delivery will facilitate data discovery and integration, quality assessment, cruise planning, compliance with funding agency and clearance requirements, and long-term data preservation. We are working collaboratively with ship operators and data managers to develop a prototype "data discovery system" for NSF-supported research vessels. Our goal is to establish infrastructure for a central shoreside repository, and to develop and test procedures for the routine delivery of standard data products and documentation to the repository. Related efforts are underway to identify tools and criteria for quality control of standard data products, and to develop standard interfaces and procedures for maintaining an underway event log. Development of a shoreside repository infrastructure will include: 1. Deployment and testing of a central catalog that holds cruise summaries and vessel profiles. A cruise summary will capture the essential details of a research expedition (operating institution, ports/dates, personnel, data inventory, etc.), as well as related documentation such as event logs and technical reports. A vessel profile will capture the essential details of a ship's installed instruments (manufacturer, model, serial number, reference location, etc.), with version control as the profile changes through time. The catalog's relational database schema will be based on the UNOLS Data Best Practices Committee's recommendations, and published as a formal XML specification. 2. Deployment and testing of a central repository that holds navigation and routine underway data. Based on discussion with ship operators and data managers at a workgroup meeting in September 2008, we anticipate that a subset of underway data could be delivered from ships to the central repository in near- realtime - enabling the integrated display of ship tracks at a public Web portal, for example - and a full data package could be delivered post-cruise by network transfer or disk shipment. Once ashore, data sets could be distributed to assembly centers such as the Shipboard Automated Meteorological and Oceanographic System (SAMOS) for routine processing, quality assessment, and synthesis efforts - as well as transmitted to national data centers such as NODC and NGDC for permanent archival. 3. Deployment and testing of a basic suite of Web services to make cruise summaries, vessel profiles, event logs, and navigation data easily available. A standard set of catalog records, maps, and navigation features will be published via the Open Archives Initiative (OAI) and Open Geospatial Consortium (OGC) protocols, which can then be harvested by partner data centers and/or embedded in client applications.
Childhood vesicoureteral reflux studies: registries and repositories sources and nosology.

PubMed

Chesney, Russell W; Patters, Andrea B

2013-12-01

Despite several recent studies, the advisability of antimicrobial prophylaxis and certain imaging studies for urinary tract infections (UTIs) remains controversial. The role of vesicoureteral reflux (VUR) on the severity and re-infection rates for UTIs is also difficult to assess. Registries and repositories of data and biomaterials from clinical studies in children with VUR are valuable. Disease registries are collections of secondary data related to patients with a specific diagnosis, condition or procedure. Registries differ from indices in that they contain more extensive data. A research repository is an entity that receives, stores, processes and/or disseminates specimens (or other materials) as needed. It encompasses the physical location as well as the full range of activities associated with its operation. It may also be referred to as a biorepository. This report provides information about some current registries and repositories that include data and samples from children with VUR. It also describes the heterogeneous nature of the subjects, as some registries and repositories include only data or samples from patients with primary reflux while others also include those from patients with syndromic or secondary reflux. Copyright © 2012 Journal of Pediatric Urology Company. All rights reserved.
Materials Knowledge Systems in Python - A Data Science Framework for Accelerated Development of Hierarchical Materials.

PubMed

Brough, David B; Wheeler, Daniel; Kalidindi, Surya R

2017-03-01

There is a critical need for customized analytics that take into account the stochastic nature of the internal structure of materials at multiple length scales in order to extract relevant and transferable knowledge. Data driven Process-Structure-Property (PSP) linkages provide systemic, modular and hierarchical framework for community driven curation of materials knowledge, and its transference to design and manufacturing experts. The Materials Knowledge Systems in Python project (PyMKS) is the first open source materials data science framework that can be used to create high value PSP linkages for hierarchical materials that can be leveraged by experts in materials science and engineering, manufacturing, machine learning and data science communities. This paper describes the main functions available from this repository, along with illustrations of how these can be accessed, utilized, and potentially further refined by the broader community of researchers.
PyKE3: data analysis tools for NASA's Kepler, K2, and TESS missions

NASA Astrophysics Data System (ADS)

Hedges, Christina L.; Cardoso, Jose Vinicius De Miranda; Barentsen, Geert; Gully-Santiago, Michael A.; Cody, Ann Marie; Barclay, Thomas; Still, Martin; BAY AREA ENVIRONMENTAL RESEARCH IN

2018-01-01

The PyKE package is a set of easy to use tools for working with Kepler/K2 data. This includes tools to correct light curves for cotrending basis vectors, turn the raw Target Pixel File data into motion corrected light curves, check for exoplanet false positives and run new PSF photometry. We are now releasing PyKE 3, which is compatible with Python 3, is pip installable and no longer depends on PyRAF. Tools are available both as Python routines and from the command line. New tutorials are available and under construction for users to learn about Kepler and K2 data and how to best use it for their science goals. PyKE is open source and welcomes contributions from the community. Routines and more information are available on the PyKE repository on GitHub.
The Particle-in-Cell and Kinetic Simulation Software Center

NASA Astrophysics Data System (ADS)

Mori, W. B.; Decyk, V. K.; Tableman, A.; Fonseca, R. A.; Tsung, F. S.; Hu, Q.; Winjum, B. J.; An, W.; Dalichaouch, T. N.; Davidson, A.; Hildebrand, L.; Joglekar, A.; May, J.; Miller, K.; Touati, M.; Xu, X. L.

2017-10-01

The UCLA Particle-in-Cell and Kinetic Simulation Software Center (PICKSC) aims to support an international community of PIC and plasma kinetic software developers, users, and educators; to increase the use of this software for accelerating the rate of scientific discovery; and to be a repository of knowledge and history for PIC. We discuss progress towards making available and documenting illustrative open-source software programs and distinct production programs; developing and comparing different PIC algorithms; coordinating the development of resources for the educational use of kinetic software; and the outcomes of our first sponsored OSIRIS users workshop. We also welcome input and discussion from anyone interested in using or developing kinetic software, in obtaining access to our codes, in collaborating, in sharing their own software, or in commenting on how PICKSC can better serve the DPP community. Supported by NSF under Grant ACI-1339893 and by the UCLA Institute for Digital Research and Education.
Materials Knowledge Systems in Python - A Data Science Framework for Accelerated Development of Hierarchical Materials

PubMed Central

Brough, David B; Wheeler, Daniel; Kalidindi, Surya R.

2017-01-01

There is a critical need for customized analytics that take into account the stochastic nature of the internal structure of materials at multiple length scales in order to extract relevant and transferable knowledge. Data driven Process-Structure-Property (PSP) linkages provide systemic, modular and hierarchical framework for community driven curation of materials knowledge, and its transference to design and manufacturing experts. The Materials Knowledge Systems in Python project (PyMKS) is the first open source materials data science framework that can be used to create high value PSP linkages for hierarchical materials that can be leveraged by experts in materials science and engineering, manufacturing, machine learning and data science communities. This paper describes the main functions available from this repository, along with illustrations of how these can be accessed, utilized, and potentially further refined by the broader community of researchers. PMID:28690971
a Cognitive Approach to Teaching a Graduate-Level Geobia Course

NASA Astrophysics Data System (ADS)

Bianchetti, Raechel A.

2016-06-01

Remote sensing image analysis training occurs both in the classroom and the research lab. Education in the classroom for traditional pixel-based image analysis has been standardized across college curriculums. However, with the increasing interest in Geographic Object-Based Image Analysis (GEOBIA), there is a need to develop classroom instruction for this method of image analysis. While traditional remote sensing courses emphasize the expansion of skills and knowledge related to the use of computer-based analysis, GEOBIA courses should examine the cognitive factors underlying visual interpretation. This current paper provides an initial analysis of the development, implementation, and outcomes of a GEOBIA course that considers not only the computational methods of GEOBIA, but also the cognitive factors of expertise, that such software attempts to replicate. Finally, a reflection on the first instantiation of this course is presented, in addition to plans for development of an open-source repository for course materials.
BioMAJ: a flexible framework for databanks synchronization and processing.

PubMed

Filangi, Olivier; Beausse, Yoann; Assi, Anthony; Legrand, Ludovic; Larré, Jean-Marc; Martin, Véronique; Collin, Olivier; Caron, Christophe; Leroy, Hugues; Allouche, David

2008-08-15

Large- and medium-scale computational molecular biology projects require accurate bioinformatics software and numerous heterogeneous biological databanks, which are distributed around the world. BioMAJ provides a flexible, robust, fully automated environment for managing such massive amounts of data. The JAVA application enables automation of the data update cycle process and supervision of the locally mirrored data repository. We have developed workflows that handle some of the most commonly used bioinformatics databases. A set of scripts is also available for post-synchronization data treatment consisting of indexation or format conversion (for NCBI blast, SRS, EMBOSS, GCG, etc.). BioMAJ can be easily extended by personal homemade processing scripts. Source history can be kept via html reports containing statements of locally managed databanks. http://biomaj.genouest.org. BioMAJ is free open software. It is freely available under the CECILL version 2 license.
Badges to Acknowledge Open Practices: A Simple, Low-Cost, Effective Method for Increasing Transparency

PubMed Central

Kidwell, Mallory C.; Lazarević, Ljiljana B.; Baranski, Erica; Piechowski, Sarah; Falkenberg, Lina-Sophia; Sonnleitner, Carina; Fiedler, Susann; Nosek, Brian A.

2016-01-01

Beginning January 2014, Psychological Science gave authors the opportunity to signal open data and materials if they qualified for badges that accompanied published articles. Before badges, less than 3% of Psychological Science articles reported open data. After badges, 23% reported open data, with an accelerating trend; 39% reported open data in the first half of 2015, an increase of more than an order of magnitude from baseline. There was no change over time in the low rates of data sharing among comparison journals. Moreover, reporting openness does not guarantee openness. When badges were earned, reportedly available data were more likely to be actually available, correct, usable, and complete than when badges were not earned. Open materials also increased to a weaker degree, and there was more variability among comparison journals. Badges are simple, effective signals to promote open practices and improve preservation of data and materials by using independent repositories. PMID:27171007
The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples.

PubMed

Deck, John; Gaither, Michelle R; Ewing, Rodney; Bird, Christopher E; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J; Crandall, Eric D

2017-08-01

The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information's (NCBI's) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science.
The Genomic Observatories Metadatabase (GeOMe): A new repository for field and sampling event metadata associated with genetic samples

PubMed Central

Deck, John; Gaither, Michelle R.; Ewing, Rodney; Bird, Christopher E.; Davies, Neil; Meyer, Christopher; Riginos, Cynthia; Toonen, Robert J.; Crandall, Eric D.

2017-01-01

The Genomic Observatories Metadatabase (GeOMe, http://www.geome-db.org/) is an open access repository for geographic and ecological metadata associated with biosamples and genetic data. Whereas public databases have served as vital repositories for nucleotide sequences, they do not accession all the metadata required for ecological or evolutionary analyses. GeOMe fills this need, providing a user-friendly, web-based interface for both data contributors and data recipients. The interface allows data contributors to create a customized yet standard-compliant spreadsheet that captures the temporal and geospatial context of each biosample. These metadata are then validated and permanently linked to archived genetic data stored in the National Center for Biotechnology Information’s (NCBI’s) Sequence Read Archive (SRA) via unique persistent identifiers. By linking ecologically and evolutionarily relevant metadata with publicly archived sequence data in a structured manner, GeOMe sets a gold standard for data management in biodiversity science. PMID:28771471
Whole earth modeling: developing and disseminating scientific software for computational geophysics.

NASA Astrophysics Data System (ADS)

Kellogg, L. H.

2016-12-01

Historically, a great deal of specialized scientific software for modeling and data analysis has been developed by individual researchers or small groups of scientists working on their own specific research problems. As the magnitude of available data and computer power has increased, so has the complexity of scientific problems addressed by computational methods, creating both a need to sustain existing scientific software, and expand its development to take advantage of new algorithms, new software approaches, and new computational hardware. To that end, communities like the Computational Infrastructure for Geodynamics (CIG) have been established to support the use of best practices in scientific computing for solid earth geophysics research and teaching. Working as a scientific community enables computational geophysicists to take advantage of technological developments, improve the accuracy and performance of software, build on prior software development, and collaborate more readily. The CIG community, and others, have adopted an open-source development model, in which code is developed and disseminated by the community in an open fashion, using version control and software repositories like Git. One emerging issue is how to adequately identify and credit the intellectual contributions involved in creating open source scientific software. The traditional method of disseminating scientific ideas, peer reviewed publication, was not designed for review or crediting scientific software, although emerging publication strategies such software journals are attempting to address the need. We are piloting an integrated approach in which authors are identified and credited as scientific software is developed and run. Successful software citation requires integration with the scholarly publication and indexing mechanisms as well, to assign credit, ensure discoverability, and provide provenance for software.
OpenCMISS: a multi-physics & multi-scale computational infrastructure for the VPH/Physiome project.

PubMed

Bradley, Chris; Bowery, Andy; Britten, Randall; Budelmann, Vincent; Camara, Oscar; Christie, Richard; Cookson, Andrew; Frangi, Alejandro F; Gamage, Thiranja Babarenda; Heidlauf, Thomas; Krittian, Sebastian; Ladd, David; Little, Caton; Mithraratne, Kumar; Nash, Martyn; Nickerson, David; Nielsen, Poul; Nordbø, Oyvind; Omholt, Stig; Pashaei, Ali; Paterson, David; Rajagopal, Vijayaraghavan; Reeve, Adam; Röhrle, Oliver; Safaei, Soroush; Sebastián, Rafael; Steghöfer, Martin; Wu, Tim; Yu, Ting; Zhang, Heye; Hunter, Peter

2011-10-01

The VPH/Physiome Project is developing the model encoding standards CellML (cellml.org) and FieldML (fieldml.org) as well as web-accessible model repositories based on these standards (models.physiome.org). Freely available open source computational modelling software is also being developed to solve the partial differential equations described by the models and to visualise results. The OpenCMISS code (opencmiss.org), described here, has been developed by the authors over the last six years to replace the CMISS code that has supported a number of organ system Physiome projects. OpenCMISS is designed to encompass multiple sets of physical equations and to link subcellular and tissue-level biophysical processes into organ-level processes. In the Heart Physiome project, for example, the large deformation mechanics of the myocardial wall need to be coupled to both ventricular flow and embedded coronary flow, and the reaction-diffusion equations that govern the propagation of electrical waves through myocardial tissue need to be coupled with equations that describe the ion channel currents that flow through the cardiac cell membranes. In this paper we discuss the design principles and distributed memory architecture behind the OpenCMISS code. We also discuss the design of the interfaces that link the sets of physical equations across common boundaries (such as fluid-structure coupling), or between spatial fields over the same domain (such as coupled electromechanics), and the concepts behind CellML and FieldML that are embodied in the OpenCMISS data structures. We show how all of these provide a flexible infrastructure for combining models developed across the VPH/Physiome community. Copyright © 2011 Elsevier Ltd. All rights reserved.
SPECIATE 4.3: Addendum to SPECIATE 4.2--Speciation database development documentation

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...

Repository on maternal child health: health portal to improve access to information on maternal child health in India.

PubMed

Khanna, Rajesh; Karikalan, N; Mishra, Anil Kumar; Agarwal, Anchal; Bhattacharya, Madhulekha; Das, Jayanta K

2013-01-02

Quality and essential health information is considered one of the most cost-effective interventions to improve health for a developing country. Healthcare portals have revolutionalized access to health information and knowledge using the Internet and related technologies, but their usage is far from satisfactory in India. This article describes a health portal developed in India aimed at providing one-stop access to efficiently search, organize and share maternal child health information relevant from public health perspective in the country. The portal 'Repository on Maternal Child Health' was developed using an open source content management system and standardized processes were followed for collection, selection, categorization and presentation of resource materials. Its usage is evaluated using key performance indicators obtained from Google Analytics, and quality assessed using a standardized checklist of knowledge management. The results are discussed in relation to improving quality and access to health information. The portal was launched in July 2010 and provides free access to full-text of 900 resource materials categorized under specific topics and themes. During the subsequent 18 months, 52,798 visits were registered from 174 countries across the world, and more than three-fourth visits were from India alone. Nearly 44,000 unique visitors visited the website and spent an average time of 4 minutes 26 seconds. The overall bounce rate was 27.6%. An increase in the number of unique visitors was found to be significantly associated with an increase in the average time on site (p-value 0.01), increase in the web traffic through search engines (p-value 0.00), and decrease in the bounce rate (p-value 0.03). There was a high degree of agreement between the two experts regarding quality assessment carried out under the three domains of knowledge access, knowledge creation and knowledge transfer (Kappa statistic 0.72). Efficient management of health information is imperative for informed decision making, and digital repositories have now-a-days become the preferred source of information management. The growing popularity of the portal indicates the potential of such initiatives in improving access to quality and essential health information in India. There is a need to develop similar mechanisms for other health domains and interlink them to facilitate access to a variety of health information from a single platform.
Repository on maternal child health: Health portal to improve access to information on maternal child health in India

PubMed Central

2013-01-01

Background Quality and essential health information is considered one of the most cost-effective interventions to improve health for a developing country. Healthcare portals have revolutionalized access to health information and knowledge using the Internet and related technologies, but their usage is far from satisfactory in India. This article describes a health portal developed in India aimed at providing one-stop access to efficiently search, organize and share maternal child health information relevant from public health perspective in the country. Methods The portal ‘Repository on Maternal Child Health’ was developed using an open source content management system and standardized processes were followed for collection, selection, categorization and presentation of resource materials. Its usage is evaluated using key performance indicators obtained from Google Analytics, and quality assessed using a standardized checklist of knowledge management. The results are discussed in relation to improving quality and access to health information. Results The portal was launched in July 2010 and provides free access to full-text of 900 resource materials categorized under specific topics and themes. During the subsequent 18 months, 52,798 visits were registered from 174 countries across the world, and more than three-fourth visits were from India alone. Nearly 44,000 unique visitors visited the website and spent an average time of 4 minutes 26 seconds. The overall bounce rate was 27.6%. An increase in the number of unique visitors was found to be significantly associated with an increase in the average time on site (p-value 0.01), increase in the web traffic through search engines (p-value 0.00), and decrease in the bounce rate (p-value 0.03). There was a high degree of agreement between the two experts regarding quality assessment carried out under the three domains of knowledge access, knowledge creation and knowledge transfer (Kappa statistic 0.72). Conclusions Efficient management of health information is imperative for informed decision making, and digital repositories have now-a-days become the preferred source of information management. The growing popularity of the portal indicates the potential of such initiatives in improving access to quality and essential health information in India. There is a need to develop similar mechanisms for other health domains and interlink them to facilitate access to a variety of health information from a single platform. PMID:23281735
Automated extraction and semantic analysis of mutation impacts from the biomedical literature

PubMed Central

2012-01-01

Background Mutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities. Results We present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner. Conclusion We present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions. PMID:22759648
The Age-ility Project (Phase 1): Structural and functional imaging and electrophysiological data repository.

PubMed

Karayanidis, Frini; Keuken, Max C; Wong, Aaron; Rennie, Jaime L; de Hollander, Gilles; Cooper, Patrick S; Ross Fulham, W; Lenroot, Rhoshel; Parsons, Mark; Phillips, Natalie; Michie, Patricia T; Forstmann, Birte U

2016-01-01

Our understanding of the complex interplay between structural and functional organisation of brain networks is being advanced by the development of novel multi-modal analyses approaches. The Age-ility Project (Phase 1) data repository offers open access to structural MRI, diffusion MRI, and resting-state fMRI scans, as well as resting-state EEG recorded from the same community participants (n=131, 15-35 y, 66 male). Raw imaging and electrophysiological data as well as essential demographics are made available via the NITRC website. All data have been reviewed for artifacts using a rigorous quality control protocol and detailed case notes are provided. Copyright © 2015. Published by Elsevier Inc.
Tourism impacts of Three Mile Island and other adverse events: Implications for Lincoln County and other rural counties bisected by radioactive wastes intended for Yucca Mountain

NASA Astrophysics Data System (ADS)

Himmelberger, Jeffery J.; Baughman, Mike; Ogneva-Himmelberger, Yelena A.

1995-11-01

Whether the proposed Yucca Mountain nuclear waste repository system will adversely impact tourism in southern Nevada is an open question of particular importance to visitor-oriented rural counties bisected by planned waste transportatin corridors (highway or rail). As part of one such county's repository impact assessment program, tourism implications of Three Mile Island (TMI) and other major hazard events have beem revisited to inform ongoing county-wide socioeconomic assessments and contingency planning efforts. This paper summarizes key research implications of such research as applied to Lincoln County, Nevada. Implications for other rural counties are discussed in light of the research findings.
mSciences: An Affinity Space for Science Teachers

ERIC Educational Resources Information Center

Mota, Jorge; Morais, Carla; Moreira, Luciano; Paiva, João C.

2017-01-01

The project "Multimedia in science teaching: five years of research and teaching in Portugal" was successful in featuring the national research on multimedia in science education and in providing the community with a simple reference tool--a repository of open access scientific texts. The current work aims to describe the theoretical…
The VLAB OER Experience: Modeling Potential-Adopter Student Acceptance

ERIC Educational Resources Information Center

Raman, Raghu; Achuthan, Krishnashree; Nedungadi, Prema; Diwakar, Shyam; Bose, Ranjan

2014-01-01

Virtual Labs (VLAB) is a multi-institutional Open Educational Resources (OER) initiative, exclusively focused on lab experiments for engineering education. This project envisages building a large OER repository, containing over 1650 virtual experiments mapped to the engineering curriculum. The introduction of VLAB is a paradigm shift in an…
Institutional Repositories: Thinking beyond the Box

ERIC Educational Resources Information Center

Albanese, Andrew Richard

2009-01-01

In February 2008, the faculty of Arts and Sciences at Harvard University made history, unanimously passing a revolutionary open access mandate that, for the first time, would require faculty to give the university copies of their research, along with a nonexclusive license to distribute them electronically. In the press, Harvard University…
Finding geospatial pattern of unstructured data by clustering routes

NASA Astrophysics Data System (ADS)

Boustani, M.; Mattmann, C. A.; Ramirez, P.; Burke, W.

2016-12-01

Today the majority of data generated has a geospatial context to it. Either in attribute form as a latitude or longitude, or name of location or cross referenceable using other means such as an external gazetteer or location service. Our research is interested in exploiting geospatial location and context in unstructured data such as that found on the web in HTML pages, images, videos, documents, and other areas, and in structured information repositories found on intranets, in scientific environments, and otherwise. We are working together on the DARPA MEMEX project to exploit open source software tools such as the Lucene Geo Gazetteer, Apache Tika, Apache Lucene, and Apache OpenNLP, to automatically extract, and make meaning out of geospatial information. In particular, we are interested in unstructured descriptors e.g., a phone number, or a named entity, and the ability to automatically learn geospatial paths related to these descriptors. For example, a particular phone number may represent an entity that travels on a monthly basis, according to easily identifiable and somes more difficult to track patterns. We will present a set of automatic techniques to extract descriptors, and then to geospatially infer their paths across unstructured data.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Finnell, Joshua Eugene

US President Barack Obama issued Executive Order 13642-Making Open and Machine Readable the New Default for Government - on May, 9 2013, mandating, wherever legally permissible and possible, that US Government information be made open to the public.[1] This edict accelerated the construction of and framework for data repositories, and data citation principles and practices, such as data.gov. As a corollary, researchers across the country's national laboratories found themselves creating data management plans, applying data set metadata standards, and ensuring the long-term access of data for federally funded scientific research.
re3data.org - a global registry of research data repositories

NASA Astrophysics Data System (ADS)

Pampel, Heinz; Vierkant, Paul; Elger, Kirsten; Bertelmann, Roland; Witt, Michael; Schirmbacher, Peter; Rücknagel, Jessika; Kindling, Maxi; Scholze, Frank; Ulrich, Robert

2016-04-01

re3data.org - the registry of research data repositories lists over 1,400 research data repositories from all over the world making it the largest and most comprehensive online catalog of research data repositories on the web. The registry is a valuable tool for researchers, funding organizations, publishers and libraries. re3data.org provides detailed information about research data repositories, and its distinctive icons help researchers to easily identify relevant repositories for accessing and depositing data sets [1]. Funding agencies, like the European Commission [2] and research institutions like the University of Bielefeld [3] already recommend the use of re3data.org in their guidelines and policies. Several publishers and journals like Copernicus Publications, PeerJ, and Nature's Scientific Data recommend re3data.org in their editorial policies as a tool for the easy identification of appropriate data repositories to store research data. Project partners in re3data.org are the Library and Information Services department (LIS) of the GFZ German Research Centre for Geosciences, the Computer and Media Service at the Humboldt-Universität zu Berlin, the Purdue University Libraries and the KIT Library at the Karlsruhe Institute of Technology (KIT). After its fusion with the U.S. American DataBib in 2014, re3data.org continues as a service of DataCite from 2016 on. DataCite is the international organization for the registration of Digital Object Identifiers (DOI) for research data and aims to improve their citation. The poster describes the current status and the future plans of re3data.org. [1] Pampel H, et al. (2013) Making Research Data Repositories Visible: The re3data.org Registry. PLoS ONE 8(11): e78080. doi:10.1371/journal.pone.0078080. [2] European Commission (2015): Guidelines on Open Access to Scientific Publications and Research Data in Horizon 2020. Available: http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf Accessed 11 January 2016. [3] Bielefeld University (2013): Resolution on Research Data Management. Available: http://data.uni-bielefeld.de/en/resolution Accessed 11 January 2016.
A Climate Statistics Tool and Data Repository

NASA Astrophysics Data System (ADS)

Wang, J.; Kotamarthi, V. R.; Kuiper, J. A.; Orr, A.

2017-12-01

Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.
Exposing exposure: automated anatomy-specific CT radiation exposure extraction for quality assurance and radiation monitoring.

PubMed

Sodickson, Aaron; Warden, Graham I; Farkas, Cameron E; Ikuta, Ichiro; Prevedello, Luciano M; Andriole, Katherine P; Khorasani, Ramin

2012-08-01

To develop and validate an informatics toolkit that extracts anatomy-specific computed tomography (CT) radiation exposure metrics (volume CT dose index and dose-length product) from existing digital image archives through optical character recognition of CT dose report screen captures (dose screens) combined with Digital Imaging and Communications in Medicine attributes. This institutional review board-approved HIPAA-compliant study was performed in a large urban health care delivery network. Data were drawn from a random sample of CT encounters that occurred between 2000 and 2010; images from these encounters were contained within the enterprise image archive, which encompassed images obtained at an adult academic tertiary referral hospital and its affiliated sites, including a cancer center, a community hospital, and outpatient imaging centers, as well as images imported from other facilities. Software was validated by using 150 randomly selected encounters for each major CT scanner manufacturer, with outcome measures of dose screen retrieval rate (proportion of correctly located dose screens) and anatomic assignment precision (proportion of extracted exposure data with correctly assigned anatomic region, such as head, chest, or abdomen and pelvis). The 95% binomial confidence intervals (CIs) were calculated for discrete proportions, and CIs were derived from the standard error of the mean for continuous variables. After validation, the informatics toolkit was used to populate an exposure repository from a cohort of 54 549 CT encounters; of which 29 948 had available dose screens. Validation yielded a dose screen retrieval rate of 99% (597 of 605 CT encounters; 95% CI: 98%, 100%) and an anatomic assignment precision of 94% (summed DLP fraction correct 563 in 600 CT encounters; 95% CI: 92%, 96%). Patient safety applications of the resulting data repository include benchmarking between institutions, CT protocol quality control and optimization, and cumulative patient- and anatomy-specific radiation exposure monitoring. Large-scale anatomy-specific radiation exposure data repositories can be created with high fidelity from existing digital image archives by using open-source informatics tools.
Data Publication: Addressing the Issues of Provenance, Attribution, Citation, and Accessibility

NASA Astrophysics Data System (ADS)

Raymond, L. M.; Chandler, C. L.; Lowry, R. K.; Urban, E. R.; Moncoiffe, G.; Pissierssens, P.; Norton, C.

2010-12-01

Motivated by publisher and funding agency mandates, and a desire to properly attribute data sets to originating investigators, the Marine Biological Laboratory/Woods Hole Oceanographic Institution (MBLWHOI) Library and a team of data managers and scientists are collaborating with representatives from the Scientific Committee on Oceanic Research (SCOR) and the International Oceanographic Data and Information Exchange (IODE) of the Intergovernmental Oceanographic Commission. The work is inspired by a June 2008 SCOR/IODE Workshop on Data Publishing. The goal is to identify best practices for tracking data provenance and clearly attributing credit to data collectors/providers for data published in journal articles. To improve efficacy of data directly associated with a scientific article those data must be discoverable, citeable and freely available on the Internet. Resources, standards, and workflows must be defined to support publisher and funding agency mandates. For the data to be discoverable, appropriate metadata, defined using community accepted metadata standards, must be associated with the data source. Data will be made citeable by the assignment of a persistent identifier as well as provenance and attribution metadata. The availability of the data will be assured by submission to a data repository that has stability and permanence. In April 2010, project participants were challenged to develop and execute pilot projects related to two use cases in which: (1) data held by data centers are packaged and served in formats that can be cited and (2) data related to traditional journal articles are assigned persistent identifiers referred to in the articles and stored in institutional repositories, such as DSpace. The MBLWHOI Library team chose to focus on data that support published articles, particularly the data used to create the figures and tables. Several published papers were identified and used to test the MBLWHOI Library model based on open archive technology. We will report on the successful implementation of the e-repository model for publication of data associated with the pilot projects and summarize the strategies for meeting the cultural and technical challenges.
BAMOS: A recording application for BAsso MOuse scale of locomotion in experimental models of spinal cord injury.

PubMed

Gómez, Alberto; Nieto-Díaz, Manuel; Del Águila, Ángela; Arias, Enrique

2018-05-01

Transparency in science is increasingly a hot topic. Scientists are required to show not only results but also evidence of how they have achieved these results. In experimental studies of spinal cord injury, there are a number of standardized tests, such as the Basso-Beattie-Bresnahan locomotor rating scale for rats and Basso Mouse Scale for mice, which researchers use to study the pathophysiology of spinal cord injury and to evaluate the effects of experimental therapies. Although the standardized data from the Basso-Beattie-Bresnahan locomotor rating scale and the Basso Mouse Scale are particularly suited for storage and sharing in databases, systems of data acquisition and repositories are still lacking. To the best of our knowledge, both tests are usually conducted manually, with the data being recorded on a paper form, which may be documented with video recordings, before the data is transferred to a spreadsheet for analysis. The data thus obtained is used to compute global scores, which is the information that usually appears in publications, with a wealth of information being omitted. This information may be relevant to understand locomotion deficits or recovery, or even important aspects of the treatment effects. Therefore, this paper presents a mobile application to record and share Basso Mouse Scale tests, meeting the following criteria: i) user-friendly; ii) few hardware requirements (only a smartphone or tablet with a camera running under Android Operating System); and iii) based on open source software such as SQLite, XML, Java, Android Studio and Android SDK. The BAMOS app can be downloaded and installed from the Google Market repository and the app code is available at the GitHub repository. The BAMOS app demonstrates that mobile technology constitutes an opportunity to develop tools for aiding spinal cord injury scientists in recording and sharing experimental data. Copyright © 2018 Elsevier Ltd. All rights reserved.
A Digital Repository and Execution Platform for Interactive Scholarly Publications in Neuroscience.

PubMed

Hodge, Victoria; Jessop, Mark; Fletcher, Martyn; Weeks, Michael; Turner, Aaron; Jackson, Tom; Ingram, Colin; Smith, Leslie; Austin, Jim

2016-01-01

The CARMEN Virtual Laboratory (VL) is a cloud-based platform which allows neuroscientists to store, share, develop, execute, reproduce and publicise their work. This paper describes new functionality in the CARMEN VL: an interactive publications repository. This new facility allows users to link data and software to publications. This enables other users to examine data and software associated with the publication and execute the associated software within the VL using the same data as the authors used in the publication. The cloud-based architecture and SaaS (Software as a Service) framework allows vast data sets to be uploaded and analysed using software services. Thus, this new interactive publications facility allows others to build on research results through reuse. This aligns with recent developments by funding agencies, institutions, and publishers with a move to open access research. Open access provides reproducibility and verification of research resources and results. Publications and their associated data and software will be assured of long-term preservation and curation in the repository. Further, analysing research data and the evaluations described in publications frequently requires a number of execution stages many of which are iterative. The VL provides a scientific workflow environment to combine software services into a processing tree. These workflows can also be associated with publications and executed by users. The VL also provides a secure environment where users can decide the access rights for each resource to ensure copyright and privacy restrictions are met.
EPA’s SPECIATE 4.4 Database: Bridging Data Sources and Data Users

EPA Science Inventory

SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...
DeepInfer: open-source deep learning deployment toolkit for image-guided therapy

NASA Astrophysics Data System (ADS)

Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A.; Kapur, Tina; Wells, William M.; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang

2017-03-01

Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research work ows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose "DeepInfer" - an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections.
DeepInfer: Open-Source Deep Learning Deployment Toolkit for Image-Guided Therapy.

PubMed

Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A; Kapur, Tina; Wells, William M; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang

2017-02-11

Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research workflows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose "DeepInfer" - an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections.
DeepInfer: Open-Source Deep Learning Deployment Toolkit for Image-Guided Therapy

PubMed Central

Mehrtash, Alireza; Pesteie, Mehran; Hetherington, Jorden; Behringer, Peter A.; Kapur, Tina; Wells, William M.; Rohling, Robert; Fedorov, Andriy; Abolmaesumi, Purang

2017-01-01

Deep learning models have outperformed some of the previous state-of-the-art approaches in medical image analysis. Instead of using hand-engineered features, deep models attempt to automatically extract hierarchical representations at multiple levels of abstraction from the data. Therefore, deep models are usually considered to be more flexible and robust solutions for image analysis problems compared to conventional computer vision models. They have demonstrated significant improvements in computer-aided diagnosis and automatic medical image analysis applied to such tasks as image segmentation, classification and registration. However, deploying deep learning models often has a steep learning curve and requires detailed knowledge of various software packages. Thus, many deep models have not been integrated into the clinical research workflows causing a gap between the state-of-the-art machine learning in medical applications and evaluation in clinical research procedures. In this paper, we propose “DeepInfer” – an open-source toolkit for developing and deploying deep learning models within the 3D Slicer medical image analysis platform. Utilizing a repository of task-specific models, DeepInfer allows clinical researchers and biomedical engineers to deploy a trained model selected from the public registry, and apply it to new data without the need for software development or configuration. As two practical use cases, we demonstrate the application of DeepInfer in prostate segmentation for targeted MRI-guided biopsy and identification of the target plane in 3D ultrasound for spinal injections. PMID:28615794

RGG: A general GUI Framework for R scripts

PubMed Central

Visne, Ilhami; Dilaveroglu, Erkan; Vierlinger, Klemens; Lauss, Martin; Yildiz, Ahmet; Weinhaeusel, Andreas; Noehammer, Christa; Leisch, Friedrich; Kriegner, Albert

2009-01-01

Background R is the leading open source statistics software with a vast number of biostatistical and bioinformatical analysis packages. To exploit the advantages of R, extensive scripting/programming skills are required. Results We have developed a software tool called R GUI Generator (RGG) which enables the easy generation of Graphical User Interfaces (GUIs) for the programming language R by adding a few Extensible Markup Language (XML) – tags. RGG consists of an XML-based GUI definition language and a Java-based GUI engine. GUIs are generated in runtime from defined GUI tags that are embedded into the R script. User-GUI input is returned to the R code and replaces the XML-tags. RGG files can be developed using any text editor. The current version of RGG is available as a stand-alone software (RGGRunner) and as a plug-in for JGR. Conclusion RGG is a general GUI framework for R that has the potential to introduce R statistics (R packages, built-in functions and scripts) to users with limited programming skills and helps to bridge the gap between R developers and GUI-dependent users. RGG aims to abstract the GUI development from individual GUI toolkits by using an XML-based GUI definition language. Thus RGG can be easily integrated in any software. The RGG project further includes the development of a web-based repository for RGG-GUIs. RGG is an open source project licensed under the Lesser General Public License (LGPL) and can be downloaded freely at PMID:19254356
NeuroVault.org: A repository for sharing unthresholded statistical maps, parcellations, and atlases of the human brain.

PubMed

Gorgolewski, Krzysztof J; Varoquaux, Gael; Rivera, Gabriel; Schwartz, Yannick; Sochat, Vanessa V; Ghosh, Satrajit S; Maumet, Camille; Nichols, Thomas E; Poline, Jean-Baptiste; Yarkoni, Tal; Margulies, Daniel S; Poldrack, Russell A

2016-01-01

NeuroVault.org is dedicated to storing outputs of analyses in the form of statistical maps, parcellations and atlases, a unique strategy that contrasts with most neuroimaging repositories that store raw acquisition data or stereotaxic coordinates. Such maps are indispensable for performing meta-analyses, validating novel methodology, and deciding on precise outlines for regions of interest (ROIs). NeuroVault is open to maps derived from both healthy and clinical populations, as well as from various imaging modalities (sMRI, fMRI, EEG, MEG, PET, etc.). The repository uses modern web technologies such as interactive web-based visualization, cognitive decoding, and comparison with other maps to provide researchers with efficient, intuitive tools to improve the understanding of their results. Each dataset and map is assigned a permanent Universal Resource Locator (URL), and all of the data is accessible through a REST Application Programming Interface (API). Additionally, the repository supports the NIDM-Results standard and has the ability to parse outputs from popular FSL and SPM software packages to automatically extract relevant metadata. This ease of use, modern web-integration, and pioneering functionality holds promise to improve the workflow for making inferences about and sharing whole-brain statistical maps. Copyright © 2015 Elsevier Inc. All rights reserved.
What Four Million Mappings Can Tell You about Two Hundred Ontologies

NASA Astrophysics Data System (ADS)

Ghazvinian, Amir; Noy, Natalya F.; Jonquet, Clement; Shah, Nigam; Musen, Mark A.

The field of biomedicine has embraced the Semantic Web probably more than any other field. As a result, there is a large number of biomedical ontologies covering overlapping areas of the field. We have developed BioPortal—an open community-based repository of biomedical ontologies. We analyzed ontologies and terminologies in BioPortal and the Unified Medical Language System (UMLS), creating more than 4 million mappings between concepts in these ontologies and terminologies based on the lexical similarity of concept names and synonyms. We then analyzed the mappings and what they tell us about the ontologies themselves, the structure of the ontology repository, and the ways in which the mappings can help in the process of ontology design and evaluation. For example, we can use the mappings to guide users who are new to a field to the most pertinent ontologies in that field, to identify areas of the domain that are not covered sufficiently by the ontologies in the repository, and to identify which ontologies will serve well as background knowledge in domain-specific tools. While we used a specific (but large) ontology repository for the study, we believe that the lessons we learned about the value of a large-scale set of mappings to ontology users and developers are general and apply in many other domains.
Raw diffraction data preservation and reuse: Overview, update on practicalities and metadata requirements

DOE PAGES

Kroon-Batenburg, Loes M. J.; Helliwell, John R.; McMahon, Brian; ...

2017-01-01

A topical review is presented of the rapidly developing interest in and storage options for the preservation and reuse of raw data within the scientific domain of the IUCr and its Commissions, each of which operates within a great diversity of instrumentation. A résumé is included of the case for raw diffraction data deposition. An overall context is set by highlighting the initiatives of science policy makers towards an `Open Science' model within which crystallographers will increasingly work in the future; this will bring new funding opportunities but also new codes of procedure within open science frameworks. Skills education andmore » training for crystallographers will need to be expanded. Overall, there are now the means and the organization for the preservation of raw crystallographic diffraction dataviadifferent types of archive, such as at universities, discipline-specific repositories (Integrated Resource for Reproducibility in Macromolecular Crystallography, Structural Biology Data Grid), general public data repositories (Zenodo, ResearchGate) and centralized neutron and X-ray facilities. Formulation of improved metadata descriptors for the raw data types of each of the IUCr Commissions is in progress; some detailed examples are provided. Lastly, a number of specific case studies are presented, including an example research thread that provides complete open access to raw data.« less
Raw diffraction data preservation and reuse: overview, update on practicalities and metadata requirements

PubMed Central

Kroon-Batenburg, Loes M. J.

2017-01-01

A topical review is presented of the rapidly developing interest in and storage options for the preservation and reuse of raw data within the scientific domain of the IUCr and its Commissions, each of which operates within a great diversity of instrumentation. A résumé is included of the case for raw diffraction data deposition. An overall context is set by highlighting the initiatives of science policy makers towards an ‘Open Science’ model within which crystallographers will increasingly work in the future; this will bring new funding opportunities but also new codes of procedure within open science frameworks. Skills education and training for crystallographers will need to be expanded. Overall, there are now the means and the organization for the preservation of raw crystallographic diffraction data via different types of archive, such as at universities, discipline-specific repositories (Integrated Resource for Reproducibility in Macromolecular Crystallography, Structural Biology Data Grid), general public data repositories (Zenodo, ResearchGate) and centralized neutron and X-ray facilities. Formulation of improved metadata descriptors for the raw data types of each of the IUCr Commissions is in progress; some detailed examples are provided. A number of specific case studies are presented, including an example research thread that provides complete open access to raw data. PMID:28250944
Raw diffraction data preservation and reuse: Overview, update on practicalities and metadata requirements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kroon-Batenburg, Loes M. J.; Helliwell, John R.; McMahon, Brian

A topical review is presented of the rapidly developing interest in and storage options for the preservation and reuse of raw data within the scientific domain of the IUCr and its Commissions, each of which operates within a great diversity of instrumentation. A résumé is included of the case for raw diffraction data deposition. An overall context is set by highlighting the initiatives of science policy makers towards an `Open Science' model within which crystallographers will increasingly work in the future; this will bring new funding opportunities but also new codes of procedure within open science frameworks. Skills education andmore » training for crystallographers will need to be expanded. Overall, there are now the means and the organization for the preservation of raw crystallographic diffraction dataviadifferent types of archive, such as at universities, discipline-specific repositories (Integrated Resource for Reproducibility in Macromolecular Crystallography, Structural Biology Data Grid), general public data repositories (Zenodo, ResearchGate) and centralized neutron and X-ray facilities. Formulation of improved metadata descriptors for the raw data types of each of the IUCr Commissions is in progress; some detailed examples are provided. Lastly, a number of specific case studies are presented, including an example research thread that provides complete open access to raw data.« less
Characterize Framework for Igneous Activity at Yucca Mountain, Nevada

DOE Office of Scientific and Technical Information (OSTI.GOV)

F. Perry; B. Youngs

2000-11-06

The purpose of this Analysis/Model (AMR) report is twofold. (1) The first is to present a conceptual framework of igneous activity in the Yucca Mountain region (YMR) consistent with the volcanic and tectonic history of this region and the assessment of this history by experts who participated in the Probabilistic Volcanic Hazard Analysis (PVHA) (CRWMS M&O 1996). Conceptual models presented in the PVHA are summarized and extended in areas in which new information has been presented. Alternative conceptual models are discussed as well as their impact on probability models. The relationship between volcanic source zones defined in the PVHA andmore » structural features of the YMR are described based on discussions in the PVHA and studies presented since the PVHA. (2) The second purpose of the AMR is to present probability calculations based on PVHA outputs. Probability distributions are presented for the length and orientation of volcanic dikes within the repository footprint and for the number of eruptive centers located within the repository footprint (conditional on the dike intersecting the repository). The probability of intersection of a basaltic dike within the repository footprint was calculated in the AMR ''Characterize Framework for Igneous Activity at Yucca Mountain, Nevada'' (CRWMS M&O 2000g) based on the repository footprint known as the Enhanced Design Alternative [EDA II, Design B (CRWMS M&O 1999a; Wilkins and Heath 1999)]. Then, the ''Site Recommendation Design Baseline'' (CRWMS M&O 2000a) initiated a change in the repository design, which is described in the ''Site Recommendation Subsurface Layout'' (CRWMS M&O 2000b). Consequently, the probability of intersection of a basaltic dike within the repository footprint has also been calculated for the current repository footprint, which is called the 70,000 Metric Tons of Uranium (MTU) No-Backfill Layout (CRWMS M&O 2000b). The calculations for both footprints are presented in this AMR. In addition, the probability of an eruptive center(s) forming within the repository footprint is calculated and presented in this AMR for both repository footprint designs. This latter type of calculation was not included in the PVHA.« less
New extension software modules to enhance searching and display of transcriptome data in Tripal databases

PubMed Central

Chen, Ming; Henry, Nathan; Almsaeed, Abdullah; Zhou, Xiao; Wegrzyn, Jill; Ficklin, Stephen

2017-01-01

Abstract Tripal is an open source software package for developing biological databases with a focus on genetic and genomic data. It consists of a set of core modules that deliver essential functions for loading and displaying data records and associated attributes including organisms, sequence features and genetic markers. Beyond the core modules, community members are encouraged to contribute extension modules to build on the Tripal core and to customize Tripal for individual community needs. To expand the utility of the Tripal software system, particularly for RNASeq data, we developed two new extension modules. Tripal Elasticsearch enables fast, scalable searching of the entire content of a Tripal site as well as the construction of customized advanced searches of specific data types. We demonstrate the use of this module for searching assembled transcripts by functional annotation. A second module, Tripal Analysis Expression, houses and displays records from gene expression assays such as RNA sequencing. This includes biological source materials (biomaterials), gene expression values and protocols used to generate the data. In the case of an RNASeq experiment, this would reflect the individual organisms and tissues used to produce sequencing libraries, the normalized gene expression values derived from the RNASeq data analysis and a description of the software or code used to generate the expression values. The module will load data from common flat file formats including standard NCBI Biosample XML. Data loading, display options and other configurations can be controlled by authorized users in the Drupal administrative backend. Both modules are open source, include usage documentation, and can be found in the Tripal organization’s GitHub repository. Database URL: Tripal Elasticsearch module: https://github.com/tripal/tripal_elasticsearch Tripal Analysis Expression module: https://github.com/tripal/tripal_analysis_expression PMID:29220446
To the Geoportal and Beyond! Preparing the Earth Observing Laboratory's Datasets for Inter-Repository Discovery

NASA Astrophysics Data System (ADS)

Gordon, S.; Dattore, E.; Williams, S.

2014-12-01

Even when a data center makes it's datasets accessible, they can still be hard to discover if the user is unaware of the laboratory or organization the data center supports. NCAR's Earth Observing Laboratory (EOL) is no exception. In response to this problem and as an inquiry into the feasibility of inter-connecting all of NCAR's repositories at a discovery layer, ESRI's Geoportal was researched. It was determined that an implementation of Geoportal would be a good choice to build a proof of concept model of inter-repository discovery around. This collaborative project between the University of Illinois and NCAR is coordinated through the Data Curation Education in Research Centers program. This program is funded by the Institute of Museum and Library Services.Geoportal is open source software. It serves as an aggregation point for metadata catalogs of earth science datasets, with a focus on geospatial information. EOL's metadata is in static THREDDS catalogs. Geoportal can only create records from a THREDDS Data Server. The first step was to make EOL metadata more accessible by utilizing the ISO 19115-2 standard. It was also decided to create DIF records so EOL datasets could be ingested in NASA's Global Change Master Directory (GCMD). To offer records for harvest, it was decided to develop an OAI-PMH server. To make a compliant server, the OAI_DC standard was also implemented. A server was written in Perl to serve a set of static records. We created a sample set of records in ISO 19115-2, FGDC, DIF, and OAI_DC. We utilized GCMD shared vocabularies to enhance discoverability and precision. The proof of concept was tested and verified by having another NCAR laboratory's Geoportal harvest our sample set. To prepare for production, templates for each standard were developed and mapped to the database. These templates will help the automated creation of records. Once the OAI-PMH server is re-written in a Grails framework a dynamic representation of EOL's metadata will be available for harvest. EOL will need to develop an implementation of a Geoportal and point GCMD to the OAI-PMH server. We will also seek out partnerships with other earth science and related discipline repositories that can communicate by OAI-PMH or Geoportal so that the scientific community will benefit from more discoverable data.
Accessing and Integrating Data and Knowledge for Biomedical Research

PubMed Central

Burgun, A.; Bodenreider, O.

2008-01-01

Summary Objectives To review the issues that have arisen with the advent of translational research in terms of integration of data and knowledge, and survey current efforts to address these issues. Methods Using examples form the biomedical literature, we identified new trends in biomedical research and their impact on bioinformatics. We analyzed the requirements for effective knowledge repositories and studied issues in the integration of biomedical knowledge. Results New diagnostic and therapeutic approaches based on gene expression patterns have brought about new issues in the statistical analysis of data, and new workflows are needed are needed to support translational research. Interoperable data repositories based on standard annotations, infrastructures and services are needed to support the pooling and meta-analysis of data, as well as their comparison to earlier experiments. High-quality, integrated ontologies and knowledge bases serve as a source of prior knowledge used in combination with traditional data mining techniques and contribute to the development of more effective data analysis strategies. Conclusion As biomedical research evolves from traditional clinical and biological investigations towards omics sciences and translational research, specific needs have emerged, including integrating data collected in research studies with patient clinical data, linking omics knowledge with medical knowledge, modeling the molecular basis of diseases, and developing tools that support in-depth analysis of research data. As such, translational research illustrates the need to bridge the gap between bioinformatics and medical informatics, and opens new avenues for biomedical informatics research. PMID:18660883
Applying an Archetype-Based Approach to Electroencephalography/Event-Related Potential Experiments in the EEGBase Resource.

PubMed

Papež, Václav; Mouček, Roman

2017-01-01

The purpose of this study is to investigate the feasibility of applying openEHR (an archetype-based approach for electronic health records representation) to modeling data stored in EEGBase, a portal for experimental electroencephalography/event-related potential (EEG/ERP) data management. The study evaluates re-usage of existing openEHR archetypes and proposes a set of new archetypes together with the openEHR templates covering the domain. The main goals of the study are to (i) link existing EEGBase data/metadata and openEHR archetype structures and (ii) propose a new openEHR archetype set describing the EEG/ERP domain since this set of archetypes currently does not exist in public repositories. The main methodology is based on the determination of the concepts obtained from EEGBase experimental data and metadata that are expressible structurally by the openEHR reference model and semantically by openEHR archetypes. In addition, templates as the third openEHR resource allow us to define constraints over archetypes. Clinical Knowledge Manager (CKM), a public openEHR archetype repository, was searched for the archetypes matching the determined concepts. According to the search results, the archetypes already existing in CKM were applied and the archetypes not existing in the CKM were newly developed. openEHR archetypes support linkage to external terminologies. To increase semantic interoperability of the new archetypes, binding with the existing odML electrophysiological terminology was assured. Further, to increase structural interoperability, also other current solutions besides EEGBase were considered during the development phase. Finally, a set of templates using the selected archetypes was created to meet EEGBase requirements. A set of eleven archetypes that encompassed the domain of experimental EEG/ERP measurements were identified. Of these, six were reused without changes, one was extended, and four were newly created. All archetypes were arranged in the templates reflecting the EEGBase metadata structure. A mechanism of odML terminology referencing was proposed to assure semantic interoperability of the archetypes. The openEHR approach was found to be useful not only for clinical purposes but also for experimental data modeling.
A Framework to Integrate Public, Dynamic Metrics into an OER Platform

ERIC Educational Resources Information Center

Cohen, Jaclyn Zetta; Omollo, Kathleen Ludewig; Malicke, Dave

2014-01-01

The usage metrics for open educational resources (OER) are often either hidden behind an authentication system or shared intermittently in static, aggregated format at the repository level. This paper discusses the first year of University of Michigan's project to share its OER usage data dynamically, publicly, to synthesize it across different…
PubMed Central Canada: Beyond an Open Access Repository?

ERIC Educational Resources Information Center

Nariani, Rajiv

2013-01-01

PubMed Central Canada (PMC Canada) represents a partnership between the Canadian Institutes of Health Research (CIHR), the National Research Council's Canada Institute for Scientific and Technical Information (NRC-CISTI), and the National Library of Medicine of the US. The present study was done to gauge faculty awareness about the CIHR Policy on…
Facilitating Teachers' Reuse of Mobile Assisted Language Learning Resources Using Educational Metadata

ERIC Educational Resources Information Center

Zervas, Panagiotis; Sampson, Demetrios G.

2014-01-01

Mobile assisted language learning (MALL) and open access repositories for language learning resources are both topics that have attracted the interest of researchers and practitioners in technology enhanced learning (TeL). Yet, there is limited experimental evidence about possible factors that can influence and potentially enhance reuse of MALL…
Wordbank: An Open Repository for Developmental Vocabulary Data

ERIC Educational Resources Information Center

Frank, Michael C.; Braginsky, Mika; Yurovsky, Daniel; Marchman, Virginia A.

2017-01-01

The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early language acquisition. CDI data have been used to explore a variety of theoretically important topics, but, with few exceptions, researchers have had to rely on data collected in…
Evaluation of Groundwater Pathways and Travel Times From the Nevada Test Site to the Potential Yucca Mountain Repository

NASA Astrophysics Data System (ADS)

Pohlmann, K. F.; Zhu, J.; Ye, M.; Carroll, R. W.; Chapman, J. B.; Russell, C. E.; Shafer, D. S.

2006-12-01

Yucca Mountain (YM), Nevada has been recommended as a deep geological repository for the disposal of spent fuel and high-level radioactive waste. If YM is licensed as a repository by the Nuclear Regulatory Commission, it will be important to identify the potential for radionuclides to migrate from underground nuclear testing areas located on the Nevada Test Site (NTS) to the hydraulically downgradient repository area to ensure that monitoring does not incorrectly attribute repository failure to radionuclides originating from other sources. In this study, we use the Death Valley Regional Flow System (DVRFS) model developed by the U.S. Geological Survey to investigate potential groundwater migration pathways and associated travel times from the NTS to the proposed YM repository area. Using results from the calibrated DVRFS model and the particle tracking post-processing package MODPATH we modeled three-dimensional groundwater advective pathways in the NTS and YM region. Our study focuses on evaluating the potential for groundwater pathways between the NTS and YM withdrawal area and whether travel times for advective flow along these pathways coincide with the prospective monitoring time frame at the proposed repository. We include uncertainty in effective porosity as this is a critical variable in the determination of time for radionuclides to travel from the NTS region to the YM withdrawal area. Uncertainty in porosity is quantified through evaluation of existing site data and expert judgment and is incorporated in the model through Monte Carlo simulation. Since porosity information is limited for this region, the uncertainty is quite large and this is reflected in the results as a large range in simulated groundwater travel times.
DOE Office of Scientific and Technical Information (OSTI.GOV)

F. Perry; R. Youngs

The purpose of this scientific analysis report is threefold: (1) Present a conceptual framework of igneous activity in the Yucca Mountain region (YMR) consistent with the volcanic and tectonic history of this region and the assessment of this history by experts who participated in the probabilistic volcanic hazard analysis (PVHA) (CRWMS M&O 1996 [DIRS 100116]). Conceptual models presented in the PVHA are summarized and applied in areas in which new information has been presented. Alternative conceptual models are discussed, as well as their impact on probability models. The relationship between volcanic source zones defined in the PVHA and structural featuresmore » of the YMR are described based on discussions in the PVHA and studies presented since the PVHA. (2) Present revised probability calculations based on PVHA outputs for a repository footprint proposed in 2003 (BSC 2003 [DIRS 162289]), rather than the footprint used at the time of the PVHA. This analysis report also calculates the probability of an eruptive center(s) forming within the repository footprint using information developed in the PVHA. Probability distributions are presented for the length and orientation of volcanic dikes located within the repository footprint and for the number of eruptive centers (conditional on a dike intersecting the repository) located within the repository footprint. (3) Document sensitivity studies that analyze how the presence of potentially buried basaltic volcanoes may affect the computed frequency of intersection of the repository footprint by a basaltic dike. These sensitivity studies are prompted by aeromagnetic data collected in 1999, indicating the possible presence of previously unrecognized buried volcanoes in the YMR (Blakely et al. 2000 [DIRS 151881]; O'Leary et al. 2002 [DIRS 158468]). The results of the sensitivity studies are for informational purposes only and are not to be used for purposes of assessing repository performance.« less
DNASU plasmid and PSI:Biology-Materials repositories: resources to accelerate biological research

PubMed Central

Seiler, Catherine Y.; Park, Jin G.; Sharma, Amit; Hunter, Preston; Surapaneni, Padmini; Sedillo, Casey; Field, James; Algar, Rhys; Price, Andrea; Steel, Jason; Throop, Andrea; Fiacco, Michael; LaBaer, Joshua

2014-01-01

The mission of the DNASU Plasmid Repository is to accelerate research by providing high-quality, annotated plasmid samples and online plasmid resources to the research community through the curated DNASU database, website and repository (http://dnasu.asu.edu or http://dnasu.org). The collection includes plasmids from grant-funded, high-throughput cloning projects performed in our laboratory, plasmids from external researchers, and large collections from consortia such as the ORFeome Collaboration and the NIGMS-funded Protein Structure Initiative: Biology (PSI:Biology). Through DNASU, researchers can search for and access detailed information about each plasmid such as the full length gene insert sequence, vector information, associated publications, and links to external resources that provide additional protein annotations and experimental protocols. Plasmids can be requested directly through the DNASU website. DNASU and the PSI:Biology-Materials Repositories were previously described in the 2010 NAR Database Issue (Cormier, C.Y., Mohr, S.E., Zuo, D., Hu, Y., Rolfs, A., Kramer, J., Taycher, E., Kelley, F., Fiacco, M., Turnbull, G. et al. (2010) Protein Structure Initiative Material Repository: an open shared public resource of structural genomics plasmids for the biological community. Nucleic Acids Res., 38, D743–D749.). In this update we will describe the plasmid collection and highlight the new features in the website redesign, including new browse/search options, plasmid annotations and a dynamic vector mapping feature that was developed in collaboration with LabGenius. Overall, these plasmid resources continue to enable research with the goal of elucidating the role of proteins in both normal biological processes and disease. PMID:24225319
Shales and other argillaceous strata in the United States

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gonzales, S.; Johnson, K.S.

This report presents detailed geologic and hydrologic data that describe shales and other argillaceous rocks; data are from the open literature. These data are intended to be used in the future to aid in assessment of various strata and their potential for repository siting. No observations, conclusions, or recommendations are made by the authors of this report relative to the suitability of various argillaceous rocks for waste disposal. There are, however, other published reports that contain technical data and evaluative statements regarding the suitability of various argillaceous rocks for repository siting. Where appropriate, the authors of this report have referencedmore » this previously published literature and have summarized the technical data. 838 refs., 121 figs., 6 tabs.« less
Modelling of processes occurring in deep geological repository - Development of new modules in the GoldSim environment

NASA Astrophysics Data System (ADS)

Vopálka, D.; Lukin, D.; Vokál, A.

2006-01-01

Three new modules modelling the processes that occur in a deep geological repository have been prepared in the GoldSim computer code environment (using its Transport Module). These modules help to understand the role of selected parameters in the near-field region of the final repository and to prepare an own complex model of the repository behaviour. The source term module includes radioactive decay and ingrowth in the canister, first order degradation of fuel matrix, solubility limitation of the concentration of the studied nuclides, and diffusive migration through the surrounding bentonite layer controlled by the output boundary condition formulated with respect to the rate of water flow in the rock. The corrosion module describes corrosion of canisters made of carbon steel and transport of corrosion products in the near-field region. This module computes balance equations between dissolving species and species transported by diffusion and/or advection from the surface of a solid material. The diffusion module that includes also non-linear form of the interaction isotherm can be used for an evaluation of small-scale diffusion experiments.

OpenStereo: Open Source, Cross-Platform Software for Structural Geology Analysis

NASA Astrophysics Data System (ADS)

Grohmann, C. H.; Campanha, G. A.

2010-12-01

Free and open source software (FOSS) are increasingly seen as synonyms of innovation and progress. Freedom to run, copy, distribute, study, change and improve the software (through access to the source code) assure a high level of positive feedback between users and developers, which results in stable, secure and constantly updated systems. Several software packages for structural geology analysis are available to the user, with commercial licenses or that can be downloaded at no cost from the Internet. Some provide basic tools of stereographic projections such as plotting poles, great circles, density contouring, eigenvector analysis, data rotation etc, while others perform more specific tasks, such as paleostress or geotechnical/rock stability analysis. This variety also means a wide range of data formating for input, Graphical User Interface (GUI) design and graphic export format. The majority of packages is built for MS-Windows and even though there are packages for the UNIX-based MacOS, there aren't native packages for *nix (UNIX, Linux, BSD etc) Operating Systems (OS), forcing the users to run these programs with emulators or virtual machines. Those limitations lead us to develop OpenStereo, an open source, cross-platform software for stereographic projections and structural geology. The software is written in Python, a high-level, cross-platform programming language and the GUI is designed with wxPython, which provide a consistent look regardless the OS. Numeric operations (like matrix and linear algebra) are performed with the Numpy module and all graphic capabilities are provided by the Matplolib library, including on-screen plotting and graphic exporting to common desktop formats (emf, eps, ps, pdf, png, svg). Data input is done with simple ASCII text files, with values of dip direction and dip/plunge separated by spaces, tabs or commas. The user can open multiple file at the same time (or the same file more than once), and overlay different elements of each dataset (poles, great circles etc). The GUI shows the opened files in a tree structure, similar to “layers” of many illustration software, where the vertical order of the files in the tree reflects the drawing order of the selected elements. At this stage, the software performs plotting operations of poles to planes, lineations, great circles, density contours and rose diagrams. A set of statistics is calculated for each file and its eigenvalues and eigenvectors are used to suggest if the data is clustered about a mean value or distributed along a girdle. Modified Flinn, Triangular and histograms plots are also available. Next step of development will focus on tools as merging and rotation of datasets, possibility to save 'projects' and paleostress analysis. In its current state, OpenStereo requires Python, wxPython, Numpy and Matplotlib installed in the system. We recommend installing PythonXY or the Enthought Python Distribution on MS-Windows and MacOS machines, since all dependencies are provided. Most Linux distributions provide an easy way to install all dependencies through software repositories. OpenStereo is released under the GNU General Public License. Programmers willing to contribute are encouraged to contact the authors directly. FAPESP Grant #09/17675-5
Modern software approaches applied to a Hydrological model: the GEOtop Open-Source Software Project

NASA Astrophysics Data System (ADS)

Cozzini, Stefano; Endrizzi, Stefano; Cordano, Emanuele; Bertoldi, Giacomo; Dall'Amico, Matteo

2017-04-01

The GEOtop hydrological scientific package is an integrated hydrological model that simulates the heat and water budgets at and below the soil surface. It describes the three-dimensional water flow in the soil and the energy exchange with the atmosphere, considering the radiative and turbulent fluxes. Furthermore, it reproduces the highly non-linear interactions between the water and energy balance during soil freezing and thawing, and simulates the temporal evolution of snow cover, soil temperature and moisture. The core components of the package were presented in the 2.0 version (Endrizzi et al, 2014), which was released as Free Software Open-source project. However, despite the high scientific quality of the project, a modern software engineering approach was still missing. Such weakness hindered its scientific potential and its use both as a standalone package and, more importantly, in an integrate way with other hydrological software tools. In this contribution we present our recent software re-engineering efforts to create a robust and stable scientific software package open to the hydrological community, easily usable by researchers and experts, and interoperable with other packages. The activity takes as a starting point the 2.0 version, scientifically tested and published. This version, together with several test cases based on recent published or available GEOtop applications (Cordano and Rigon, 2013, WRR, Kollet et al, 2016, WRR) provides the baseline code and a certain number of referenced results as benchmark. Comparison and scientific validation can then be performed for each software re-engineering activity performed on the package. To keep track of any single change the package is published on its own github repository geotopmodel.github.io/geotop/ under GPL v3.0 license. A Continuous Integration mechanism by means of Travis-CI has been enabled on the github repository on master and main development branches. The usage of CMake configuration tool and the suite of tests (easily manageable by means of ctest tools) greatly reduces the burden of the installation and allows us to enhance portability on different compilers and Operating system platforms. The package was also complemented by several software tools which provide web-based visualization of results based on R plugins, in particular "shiny" (Chang at al, 2016), "geotopbricks" and "geotopOptim2" (Cordano et al, 2016) packages, which allow rapid and efficient scientific validation of new examples and tests. The software re-engineering activities are still under development. However, our first results are promising enough to eventually reach a robust and stable software project that manages in a flexible way a complex state-of-the-art hydrological model like GEOtop and integrates it into wider workflows.
SMART-on-FHIR implemented over i2b2

PubMed Central

Mandel, Joshua C; Klann, Jeffery G; Wattanasin, Nich; Mendis, Michael; Chute, Christopher G; Mandl, Kenneth D; Murphy, Shawn N

2017-01-01

We have developed an interface to serve patient data from Informatics for Integrating Biology and the Bedside (i2b2) repositories in the Fast Healthcare Interoperability Resources (FHIR) format, referred to as a SMART-on-FHIR cell. The cell serves FHIR resources on a per-patient basis, and supports the “substitutable” modular third-party applications (SMART) OAuth2 specification for authorization of client applications. It is implemented as an i2b2 server plug-in, consisting of 6 modules: authentication, REST, i2b2-to-FHIR converter, resource enrichment, query engine, and cache. The source code is freely available as open source. We tested the cell by accessing resources from a test i2b2 installation, demonstrating that a SMART app can be launched from the cell that accesses patient data stored in i2b2. We successfully retrieved demographics, medications, labs, and diagnoses for test patients. The SMART-on-FHIR cell will enable i2b2 sites to provide simplified but secure data access in FHIR format, and will spur innovation and interoperability. Further, it transforms i2b2 into an apps platform. PMID:27274012
Ensemble Eclipse: A Process for Prefab Development Environment for the Ensemble Project

NASA Technical Reports Server (NTRS)

Wallick, Michael N.; Mittman, David S.; Shams, Khawaja, S.; Bachmann, Andrew G.; Ludowise, Melissa

2013-01-01

This software simplifies the process of having to set up an Eclipse IDE programming environment for the members of the cross-NASA center project, Ensemble. It achieves this by assembling all the necessary add-ons and custom tools/preferences. This software is unique in that it allows developers in the Ensemble Project (approximately 20 to 40 at any time) across multiple NASA centers to set up a development environment almost instantly and work on Ensemble software. The software automatically has the source code repositories and other vital information and settings included. The Eclipse IDE is an open-source development framework. The NASA (Ensemble-specific) version of the software includes Ensemble-specific plug-ins as well as settings for the Ensemble project. This software saves developers the time and hassle of setting up a programming environment, making sure that everything is set up in the correct manner for Ensemble development. Existing software (i.e., standard Eclipse) requires an intensive setup process that is both time-consuming and error prone. This software is built once by a single user and tested, allowing other developers to simply download and use the software
Associating clinical archetypes through UMLS Metathesaurus term clusters.

PubMed

Lezcano, Leonardo; Sánchez-Alonso, Salvador; Sicilia, Miguel-Angel

2012-06-01

Clinical archetypes are modular definitions of clinical data, expressed using standard or open constraint-based data models as the CEN EN13606 and openEHR. There is an increasing archetype specification activity that raises the need for techniques to associate archetypes to support better management and user navigation in archetype repositories. This paper reports on a computational technique to generate tentative archetype associations by mapping them through term clusters obtained from the UMLS Metathesaurus. The terms are used to build a bipartite graph model and graph connectivity measures can be used for deriving associations.
Interoperable Archetypes With a Three Folded Terminology Governance.

PubMed

Pederson, Rune; Ellingsen, Gunnar

2015-01-01

The use of openEHR archetypes increases the interoperability of clinical terminology, and in doing so improves upon the availability of clinical terminology for both primary and secondary purposes. Where clinical terminology is employed in the EPR system, research reports conflicting a results for the use of structuring and standardization as measurements of success. In order to elucidate this concept, this paper focuses on the effort to establish a national repository for openEHR based archetypes in Norway where clinical terminology could be included with benefit for interoperability three folded.
Numerical Modeling of Thermal-Hydrology in the Near Field of a Generic High-Level Waste Repository

NASA Astrophysics Data System (ADS)

Matteo, E. N.; Hadgu, T.; Park, H.

2016-12-01

Disposal in a deep geologic repository is one of the preferred option for long term isolation of high-level nuclear waste. Coupled thermal-hydrologic processes induced by decay heat from the radioactive waste may impact fluid flow and the associated migration of radionuclides. This study looked at the effects of those processes in simulations of thermal-hydrology for the emplacement of U. S. Department of Energy managed high-level waste and spent nuclear fuel. Most of the high-level waste sources have lower thermal output which would reduce the impact of thermal propagation. In order to quantify the thermal limits this study concentrated on the higher thermal output sources and on spent nuclear fuel. The study assumed a generic nuclear waste repository at 500 m depth. For the modeling a representative domain was selected representing a portion of the repository layout in order to conduct a detailed thermal analysis. A highly refined unstructured mesh was utilized with refinements near heat sources and at intersections of different materials. Simulations looked at different values for properties of components of the engineered barrier system (i.e. buffer, disturbed rock zone and the host rock). The simulations also looked at the effects of different durations of surface aging of the waste to reduce thermal perturbations. The PFLOTRAN code (Hammond et al., 2014) was used for the simulations. Modeling results for the different options are reported and include temperature and fluid flow profiles in the near field at different simulation times. References:G. E. Hammond, P.C. Lichtner and R.T. Mills, "Evaluating the Performance of Parallel Subsurface Simulators: An Illustrative Example with PFLOTRAN", Water Resources Research, 50, doi:10.1002/2012WR013483 (2014). Sandia National Laboratories is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2016-7510 A
Rolling Deck to Repository (R2R): Supporting Global Data Access Through the Ocean Data Interoperability Platform (ODIP)

NASA Astrophysics Data System (ADS)

Arko, R. A.; Stocks, K.; Chandler, C. L.; Smith, S. R.; Miller, S. P.; Maffei, A. R.; Glaves, H. M.; Carbotte, S. M.

2013-12-01

The U.S. National Science Foundation supports a fleet of academic research vessels operating throughout the world's oceans. In addition to supporting the mission-specific goals of each expedition, these vessels routinely deploy a suite of underway environmental sensors, operating like mobile observatories. Recognizing that the data from these instruments have value beyond each cruise, NSF funded R2R in 2009 to ensure that these data are routinely captured, cataloged and described, and submitted to the appropriate national repository for long-term public access. In 2013, R2R joined the Ocean Data Interoperability Platform (ODIP; http://odip.org/). The goal of ODIP is to remove barriers to the effective sharing of data across scientific domains and international boundaries, by providing a forum to harmonize diverse regional systems. To advance this goal, ODIP organizes international workshops to foster the development of common standards and develop prototypes to evaluate and test potential standards and interoperability solutions. ODIP includes major organizations engaged in ocean data stewardship in the EU, US, and Australia, supported by the International Oceanographic Data and Information Exchange (IODE). Within the broad scope of ODIP, R2R focuses on contributions in 4 key areas: ● Implement a 'Linked Open Data' approach to disseminate data and documentation, using existing World Wide Web Consortium (W3C) specifications and machine-readable formats. Exposing content as Linked Open Data will provide a simple mechanism for ODIP collaborators to browse and compare data sets among repositories. ● Map key vocabularies used by R2R to their European and Australian counterparts. The existing heterogeneity among terms inhibits data discoverability, as a user searching on the term with which s/he is familiar may not find all data of interest. Mapping key terms across the different ODIP partners, relying on the backbone thesaurus provided by the NERC Vocabulary Server (http://vocab.nerc.ac.uk/), is a first step towards wider data discoverability. ● Upgrade existing R2R ISO metadata records to be compatible with the new SeaDataNet II Cruise Summary Report (CSR) profile, and publish the records in a standards-compliant Web portal, built on the GeoNetwork open-source package. ● Develop the future workforce. R2R is enlisting and exposing a group of five students to new informatics technologies and international collaboration. Students are undertaking a coordinated series of projects in 2013 and 2014 at each of the R2R partner institutions, combined with travel to selected meetings where they will engage in the ODIP Workshop process; present results; and exchange ideas with working scientists and software developers in Europe and Australia. Students work closely with staff at the R2R partner institutions, in projects that build on the R2R-ODIP technical work components described above.
Entrez Neuron RDFa: a pragmatic semantic web application for data integration in neuroscience research.

PubMed

Samwald, Matthias; Lim, Ernest; Masiar, Peter; Marenco, Luis; Chen, Huajun; Morse, Thomas; Mutalik, Pradeep; Shepherd, Gordon; Miller, Perry; Cheung, Kei-Hoi

2009-01-01

The amount of biomedical data available in Semantic Web formats has been rapidly growing in recent years. While these formats are machine-friendly, user-friendly web interfaces allowing easy querying of these data are typically lacking. We present "Entrez Neuron", a pilot neuron-centric interface that allows for keyword-based queries against a coherent repository of OWL ontologies. These ontologies describe neuronal structures, physiology, mathematical models and microscopy images. The returned query results are organized hierarchically according to brain architecture. Where possible, the application makes use of entities from the Open Biomedical Ontologies (OBO) and the 'HCLS knowledgebase' developed by the W3C Interest Group for Health Care and Life Science. It makes use of the emerging RDFa standard to embed ontology fragments and semantic annotations within its HTML-based user interface. The application and underlying ontologies demonstrate how Semantic Web technologies can be used for information integration within a curated information repository and between curated information repositories. It also demonstrates how information integration can be accomplished on the client side, through simple copying and pasting of portions of documents that contain RDFa markup.
The Telemetric and Holter ECG Warehouse Initiative (THEW): a Data Repository for the Design, Implementation and Validation of ECG-related Technologies

PubMed Central

Couderc, Jean-Philippe

2011-01-01

We present an initiative supported by the National Heart Lung, and Blood Institute and the Food and Drug Administration for the development of a repository containing continuous electrocardiographic information to be shared with the worldwide scientific community. We believe that sharing data reinforces open scientific inquiry. It encourages diversity of analysis and opinion while promoting new research and facilitating the education of new researchers. In this paper, we present the resources available in this initiative for the scientific community. We describe the set of ECG signals currently hosted and we briefly discuss the associated clinical information (medical history. Disease and study-specific endpoints) and software tools we propose. Currently, the repository contains more than 250GB of data from eight clinical studies including healthy individuals and cardiac patients. This data is available for the development, implementation and validation of technologies related to body-surface ECGs. To conclude, the Telemetric and Holter ECG Warehouse (THEW) is an initiative developed to benefit the scientific community and to advance the field of quantitative electrocardiography and cardiac safety. PMID:21097349
Doing Your Science While You're in Orbit

NASA Astrophysics Data System (ADS)

Green, Mark L.; Miller, Stephen D.; Vazhkudai, Sudharshan S.; Trater, James R.

2010-11-01

Large-scale neutron facilities such as the Spallation Neutron Source (SNS) located at Oak Ridge National Laboratory need easy-to-use access to Department of Energy Leadership Computing Facilities and experiment repository data. The Orbiter thick- and thin-client and its supporting Service Oriented Architecture (SOA) based services (available at https://orbiter.sns.gov) consist of standards-based components that are reusable and extensible for accessing high performance computing, data and computational grid infrastructure, and cluster-based resources easily from a user configurable interface. The primary Orbiter system goals consist of (1) developing infrastructure for the creation and automation of virtual instrumentation experiment optimization, (2) developing user interfaces for thin- and thick-client access, (3) provide a prototype incorporating major instrument simulation packages, and (4) facilitate neutron science community access and collaboration. The secure Orbiter SOA authentication and authorization is achieved through the developed Virtual File System (VFS) services, which use Role-Based Access Control (RBAC) for data repository file access, thin-and thick-client functionality and application access, and computational job workflow management. The VFS Relational Database Management System (RDMS) consists of approximately 45 database tables describing 498 user accounts with 495 groups over 432,000 directories with 904,077 repository files. Over 59 million NeXus file metadata records are associated to the 12,800 unique NeXus file field/class names generated from the 52,824 repository NeXus files. Services that enable (a) summary dashboards of data repository status with Quality of Service (QoS) metrics, (b) data repository NeXus file field/class name full text search capabilities within a Google like interface, (c) fully functional RBAC browser for the read-only data repository and shared areas, (d) user/group defined and shared metadata for data repository files, (e) user, group, repository, and web 2.0 based global positioning with additional service capabilities are currently available. The SNS based Orbiter SOA integration progress with the Distributed Data Analysis for Neutron Scattering Experiments (DANSE) software development project is summarized with an emphasis on DANSE Central Services and the Virtual Neutron Facility (VNF). Additionally, the DANSE utilization of the Orbiter SOA authentication, authorization, and data transfer services best practice implementations are presented.
Sharing environmental models: An Approach using GitHub repositories and Web Processing Services

NASA Astrophysics Data System (ADS)

Stasch, Christoph; Nuest, Daniel; Pross, Benjamin

2016-04-01

The GLUES (Global Assessment of Land Use Dynamics, Greenhouse Gas Emissions and Ecosystem Services) project established a spatial data infrastructure for scientific geospatial data and metadata (http://geoportal-glues.ufz.de), where different regional collaborative projects researching the impacts of climate and socio-economic changes on sustainable land management can share their underlying base scenarios and datasets. One goal of the project is to ease the sharing of computational models between institutions and to make them easily executable in Web-based infrastructures. In this work, we present such an approach for sharing computational models relying on GitHub repositories (http://github.com) and Web Processing Services. At first, model providers upload their model implementations to GitHub repositories in order to share them with others. The GitHub platform allows users to submit changes to the model code. The changes can be discussed and reviewed before merging them. However, while GitHub allows sharing and collaborating of model source code, it does not actually allow running these models, which requires efforts to transfer the implementation to a model execution framework. We thus have extended an existing implementation of the OGC Web Processing Service standard (http://www.opengeospatial.org/standards/wps), the 52°North Web Processing Service (http://52north.org/wps) platform to retrieve all model implementations from a git (http://git-scm.com) repository and add them to the collection of published geoprocesses. The current implementation is restricted to models implemented as R scripts using WPS4R annotations (Hinz et al.) and to Java algorithms using the 52°North WPS Java API. The models hence become executable through a standardized Web API by multiple clients such as desktop or browser GIS and modelling frameworks. If the model code is changed on the GitHub platform, the changes are retrieved by the service and the processes will be updated accordingly. The admin tool of the 52°North WPS was extended to support automated retrieval and deployment of computational models from GitHub repositories. Once the R code is available in the GitHub repo, the contained process can be easily deployed and executed by simply defining the GitHub repository URL in the WPS admin tool. We illustrate the usage of the approach by sharing and running a model for land use system archetypes developed by the Helmholtz Centre for Environmental Research (UFZ, see Vaclavik et al.). The original R code was extended and published in the 52°North WPS using both, public and non-public datasets (Nüst et al., see also https://github.com/52North/glues-wps). Hosting the analysis in a Git repository now allows WPS administrators, client developers, and modelers to easily work together on new versions or completely new web processes using the powerful GitHub collaboration platform. References: Hinz, M. et. al. (2013): Spatial Statistics on the Geospatial Web. In: The 16th AGILE International Conference on Geographic Information Science, Short Papers. http://www.agile-online.org/Conference_Paper/CDs/agile_2013/Short_Papers/SP_S3.1_Hinz.pdf Nüst, D. et. al.: (2015): Open and reproducible global land use classification. In: EGU General Assembly Conference Abstracts . Vol. 17. European Geophysical Union, 2015, p. 9125, http://meetingorganizer.copernicus. org/EGU2015/EGU2015- 9125.pdf Vaclavik, T., et. al. (2013): Mapping global land system archetypes. Global Environmental Change 23(6): 1637-1647. Online available: October 9, 2013, DOI: 10.1016/j.gloenvcha.2013.09.004
Knowledge repositories for multiple uses

NASA Technical Reports Server (NTRS)

Williamson, Keith; Riddle, Patricia

1991-01-01

In the life cycle of a complex physical device or part, for example, the docking bay door of the Space Station, there are many uses for knowledge about the device or part. The same piece of knowledge might serve several uses. Given the quantity and complexity of the knowledge that must be stored, it is critical to maintain the knowledge in one repository, in one form. At the same time, because of quantity and complexity of knowledge that must be used in life cycle applications such as cost estimation, re-design, and diagnosis, it is critical to automate such knowledge uses. For each specific use, a knowledge base must be available and must be in a from that promotes the efficient performance of that knowledge base. However, without a single source knowledge repository, the cost of maintaining consistent knowledge between multiple knowledge bases increases dramatically; as facts and descriptions change, they must be updated in each individual knowledge base. A use-neutral representation of a hydraulic system for the F-111 aircraft was developed. The ability to derive portions of four different knowledge bases is demonstrated from this use-neutral representation: one knowledge base is for re-design of the device using a model-based reasoning problem solver; two knowledge bases, at different levels of abstraction, are for diagnosis using a model-based reasoning solver; and one knowledge base is for diagnosis using an associational reasoning problem solver. It was shown how updates issued against the single source use-neutral knowledge repository can be propagated to the underlying knowledge bases.
Using Feedback from Data Consumers to Capture Quality Information on Environmental Research Data

NASA Astrophysics Data System (ADS)

Devaraju, A.; Klump, J. F.

2015-12-01

Data quality information is essential to facilitate reuse of Earth science data. Recorded quality information must be sufficient for other researchers to select suitable data sets for their analysis and confirm the results and conclusions. In the research data ecosystem, several entities are responsible for data quality. Data producers (researchers and agencies) play a major role in this aspect as they often include validation checks or data cleaning as part of their work. It is possible that the quality information is not supplied with published data sets; if it is available, the descriptions might be incomplete, ambiguous or address specific quality aspects. Data repositories have built infrastructures to share data, but not all of them assess data quality. They normally provide guidelines of documenting quality information. Some suggests that scholarly and data journals should take a role in ensuring data quality by involving reviewers to assess data sets used in articles, and incorporating data quality criteria in the author guidelines. However, this mechanism primarily addresses data sets submitted to journals. We believe that data consumers will complement existing entities to assess and document the quality of published data sets. This has been adopted in crowd-source platforms such as Zooniverse, OpenStreetMap, Wikipedia, Mechanical Turk and Tomnod. This paper presents a framework designed based on open source tools to capture and share data users' feedback on the application and assessment of research data. The framework comprises a browser plug-in, a web service and a data model such that feedback can be easily reported, retrieved and searched. The feedback records are also made available as Linked Data to promote integration with other sources on the Web. Vocabularies from Dublin Core and PROV-O are used to clarify the source and attribution of feedback. The application of the framework is illustrated with the CSIRO's Data Access Portal.
The siting program of geological repository for spent fuel/high-level waste in Czech Republic

DOE Office of Scientific and Technical Information (OSTI.GOV)

Novotny, P.

1993-12-31

The management of high-level waste in Czech Republic have a very short history, because before the year 1989 spent nuclear fuel was re-exported back to USSR. The project ``Geological research of HLW repository in Czech Republic`` was initiated during 1990 by the Ministry of the Environment of the Czech Republic and by this project delegated the Czech Geological Survey (CGU) Prague. The first CGU project late in 1990 for multibarrier concept has proposed a geological repository to be located at a depth of about 500 m. Screening and studies of potential sites for repository started in 1991. First stage representedmore » regional siting of the Czech Republic for perspective rock types and massifs. In cooperation with GEOPHYSICS Co., Geophysical Institute of the Czech Academy of Sciences and Charles University Prague 27 perspective regions were selected, using criteria IAEA. This work in the Czech Republic was possible thanks to the detailed geological studies done in the past and thanks to the numerous archive data, concentrated in the central geological archive GEOFOND. Selection of perspective sites also respected natural conservation regions, regions conserving water and mineral waters resources. CGU opened up contact with countries with similar geological situation and started cooperation with SKB (Swedish Nuclear Fuel and Waste Management Co.). The Project of geological research for the next 10 years is a result of these activities.« less
Mandatory Open Access Publishing for Electronic Theses and Dissertations: Ethics and Enthusiasm

ERIC Educational Resources Information Center

Hawkins, Ann R.; Kimball, Miles A.; Ives, Maura

2013-01-01

This article argues against policies that require students to submit theses and dissertations to electronic institutional repositories. The article counters a variety of arguments often used to justify this practice. In addition, the article reports on the results of an examination of electronic thesis and dissertation policies at more than 150…
Do Open Access Electronic Theses and Dissertations Diminish Publishing Opportunities in the Sciences?

ERIC Educational Resources Information Center

Ramírez, Marisa L.; McMillan, Gail; Dalton, Joan T.; Hanlon, Ann; Smith, Heather S.; Kern, Chelsea

2014-01-01

In academia, there is a growing acceptance of sharing the final electronic version of graduate work, such as a thesis or dissertation, in an online university repository. Though previous studies have shown that journal editors are willing to consider manuscripts derived from electronic theses and dissertations (ETDs), faculty advisors and graduate…
Publish (Your Data) or (Let the Data) Perish! Why Not Publish Your Data Too?

ERIC Educational Resources Information Center

Wicherts, Jelte M.; Bakker, Marjan

2012-01-01

The authors argue that upon publication of a paper, the data should be made available through online archives or repositories. Reasons for not sharing data are discussed and contrasted with advantages of sharing, which include abiding by the scientific principle of openness, keeping the data for posterity, increasing one's impact, facilitation of…
Ciênsação: Gaining a Feeling for Sciences

ERIC Educational Resources Information Center

de Oliveira, Marcos Henrique Abreu; Fischer, Robert

2017-01-01

Ciênsação, an open online repository for hands-on experiments, has been developed to convince teachers in Latin America that science is best experienced first hand. Permitting students to experiment autonomously in small groups can be a challenging endeavour for educators in these countries. We analyse the reasons that cause hesitation of teachers…
Focus on Academic and Research Libraries: Librarians Speak Out to Journal Publishers

ERIC Educational Resources Information Center

Kaser, Dick

2009-01-01

What is the economic situation in libraries these days? What are academic and research libraries doing with regard to making the resources in their collections more discoverable? Are they involved in institutional repository (IR) projects? And how do IRs and the availability of open access journals affect library purchasing decisions? Those were…

Natural analog studies: Licensing perspective

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bradbury, J.W.

1995-09-01

This report describes the licensing perspective of the term {open_quotes}natural analog studies{close_quotes} as used in CFR Part 60. It describes the misunderstandings related to its definition which has become evident during discussions at the U.S Nuclear Regulatory Commission meetings and tries to clarify the appropriate applications of natural analog studies to aspects of repository site characterization.
Web-Based Learning Materials for Higher Education: The MERLOT Repository

ERIC Educational Resources Information Center

Orhun, Emrah

2004-01-01

MERLOT (Multimedia Educational Resource for Learning and Online Teaching) is a web-based open resource designed primarily for faculty and students in higher education. The resources in MERLOT include over 8,000 learning materials and support materials from a wide variety of disciplines that can be integrated within the context of a larger course.…
Electronic Repositories of Marked Student Work and Their Contributions to Formative Evaluation

ERIC Educational Resources Information Center

Heinrich, Eva

2004-01-01

The educational literature shows that formative assessment is highly conducive to learning. The tasks given to students in formative assessment generally require open-ended responses that can be given, for example, in essay-type format and that are assessed by a human marker. An essential component is the formative feedback provided by the marker…
Pentacyclic ingamine-type alkaloids, a new antiplasmodial pharmacophore from the marine sponge petrosid Ng5 Sp5

USDA-ARS?s Scientific Manuscript database

Two new pentacyclic ingamine- type alkaloids, namely 22(S)-hydroxyingamine A (2) and dihydroingenamine D (3), together with the known ingamine A (1) have been isolated from marine sponge Petrosid Ng5 Sp5 (Family: Petrosiidae) obtained from the open repository of National Cancer Institute, USA. The s...
Method and system of integrating information from multiple sources

DOEpatents

Alford, Francine A [Livermore, CA; Brinkerhoff, David L [Antioch, CA

2006-08-15

A system and method of integrating information from multiple sources in a document centric application system. A plurality of application systems are connected through an object request broker to a central repository. The information may then be posted on a webpage. An example of an implementation of the method and system is an online procurement system.
Unifying Access to National Hydrologic Data Repositories via Web Services

NASA Astrophysics Data System (ADS)

Valentine, D. W.; Jennings, B.; Zaslavsky, I.; Maidment, D. R.

2006-12-01

The CUAHSI hydrologic information system (HIS) is designed to be a live, multiscale web portal system for accessing, querying, visualizing, and publishing distributed hydrologic observation data and models for any location or region in the United States. The HIS design follows the principles of open service oriented architecture, i.e. system components are represented as web services with well defined standard service APIs. WaterOneFlow web services are the main component of the design. The currently available services have been completely re-written compared to the previous version, and provide programmatic access to USGS NWIS. (steam flow, groundwater and water quality repositories), DAYMET daily observations, NASA MODIS, and Unidata NAM streams, with several additional web service wrappers being added (EPA STORET, NCDC and others.). Different repositories of hydrologic data use different vocabularies, and support different types of query access. Resolving semantic and structural heterogeneities across different hydrologic observation archives and distilling a generic set of service signatures is one of the main scalability challenges in this project, and a requirement in our web service design. To accomplish the uniformity of the web services API, data repositories are modeled following the CUAHSI Observation Data Model. The web service responses are document-based, and use an XML schema to express the semantics in a standard format. Access to station metadata is provided via web service methods, GetSites, GetSiteInfo and GetVariableInfo. The methdods form the foundation of CUAHSI HIS discovery interface and may execute over locally-stored metadata or request the information from remote repositories directly. Observation values are retrieved via a generic GetValues method which is executed against national data repositories. The service is implemented in ASP.Net, and other providers are implementing WaterOneFlow services in java. Reference implementation of WaterOneFlow web services is available. More information about the ongoing development of CUAHSI HIS is available from http://www.cuahsi.org/his/.
Electrical Resistance Tomography to Monitor Mitigation of Metal-Toxic Acid-Leachates Ruby Gulch Waste Rock Repository Gilt Edge Mine Superfund Site, South Dakota USA

NASA Astrophysics Data System (ADS)

Versteeg, R.; Heath, G.; Richardson, A.; Paul, D.; Wangerud, K.

2003-12-01

At a cyanide heap-leach open-pit mine, 15-million cubic yards of acid-generating sulfides were dumped at the head of a steep-walled mountain valley, with 30 inches/year precipitation generating 60- gallons/minute ARD leachate. Remediation has reshaped the dump to a 70-acre, 3.5:1-sloped geometry, installed drainage benches and runoff diversions, and capped the repository and lined diversions with a polyethylene geomembrane and cover system. Monitoring was needed to evaluate (a) long-term geomembrane integrity, (b) diversion liner integrity and long-term effectiveness, (c) ARD geochemistry, kinetics and pore-gas dynamics within the repository mass, and (d) groundwater interactions. Observation wells were paired with a 600-electrode resistivity survey system. Using near-surface and down-hole electrodes and automated data collection and post-processing, periodic two- and three-dimensional resistivity images are developed to reflect current and changed-conditions in moisture, temperature, geochemical components, and flow-direction analysis. Examination of total resistivity values and time variances between images allows direct observation of liner and cap integrity with precise identification and location of leaks; likewise, if runoff migrates from degraded diversion ditches into the repository zone, there is an accompanying and noticeable change in resistivity values. Used in combination with monitoring wells containing borehole resistivity electrodes (calibrated with direct sampling of dump water/moisture, temperature and pore-gas composition), the resistivity arrays allow at-depth imaging of geochemical conditions within the repository mass. The information provides early indications of progress or deficiencies in de-watering and ARD- mitigation that is the remedy intent. If emerging technologies present opportunities for secondary treatment, deep resistivity images may assist in developing application methods and evaluating the effectiveness of any reagents introduced into the repository mass to further effect changes in oxidation/reduction reactions.
Utilizing Free and Open Source Software to access, view and compare in situ observations, EO products and model output data

NASA Astrophysics Data System (ADS)

Vines, Aleksander; Hamre, Torill; Lygre, Kjetil

2014-05-01

The GreenSeas project (Development of global plankton data base and model system for eco-climate early warning) aims to advance the knowledge and predictive capacities of how marine ecosystems will respond to global change. A main task has been to set up a data delivery and monitoring core service following the open and free data access policy implemented in the Global Monitoring for the Environment and Security (GMES) programme. The aim is to ensure open and free access to historical plankton data, new data (EO products and in situ measurements), model data (including estimates of simulation error) and biological, environmental and climatic indicators to a range of stakeholders, such as scientists, policy makers and environmental managers. To this end, we have developed a geo-spatial database of both historical and new in situ physical, biological and chemical parameters for the Southern Ocean, Atlantic, Nordic Seas and the Arctic, and organized related satellite-derived quantities and model forecasts in a joint geo-spatial repository. For easy access to these data, we have implemented a web-based GIS (Geographical Information Systems) where observed, derived and forcasted parameters can be searched, displayed, compared and exported. Model forecasts can also be uploaded dynamically to the system, to allow modelers to quickly compare their results with available in situ and satellite observations. We have implemented the web-based GIS(Geographical Information Systems) system based on free and open source technologies: Thredds Data Server, ncWMS, GeoServer, OpenLayers, PostGIS, Liferay, Apache Tomcat, PRTree, NetCDF-Java, json-simple, Geotoolkit, Highcharts, GeoExt, MapFish, FileSaver, jQuery, jstree and qUnit. We also wanted to used open standards to communicate between the different services and we use WMS, WFS, netCDF, GML, OPeNDAP, JSON, and SLD. The main advantage we got from using FOSS was that we did not have to invent the wheel all over again, but could use already existing code and functionalities on our software for free: Of course most the software did not have to be open source for this, but in some cases we had to do minor modifications to make the different technologies work together. We could extract the parts of the code that we needed for a specific task. One example of this was to use part of the code from ncWMS and Thredds to help our main application to both read netCDF files and present them in the browser. This presentation will focus on both difficulties we had with and advantages we got from developing this tool with FOSS.
Bio-repository of post-clinical test samples at the national cancer center hospital (NCCH) in Tokyo.

PubMed

Furuta, Koh; Yokozawa, Karin; Takada, Takako; Kato, Hoichi

2009-08-01

We established the Bio-repository at the National Cancer Center Hospital in October 2002. The main purpose of this article is to show the importance and usefulness of a bio-repository of post-clinical test samples not only for translational cancer research but also for routine clinical oncology by introducing the experience of setting up such a facility. Our basic concept of a post-clinical test sample is not as left-over waste, but rather as frozen evidence of a patient's pathological condition at a particular point. We can decode, if not all, most of the laboratory data from a post-clinical test sample. As a result, the bio-repository is able to provide not only the samples, but potentially all related laboratory data upon request. The areas of sample coverage are the following: sera after routine blood tests; sera after cross-match tests for transfusion; serum or plasma submitted at a patient's clinically important time period by the physician; and samples collected by the individual investigator. The formats of stored samples are plasma or serum, dried blood spot (DBS) and buffy coat. So far, 150 218 plasmas or sera, 35 253 DBS and 536 buffy coats have been registered for our bio-repository system. We arranged to provide samples to various concerned parties under strict legal and ethical agreements. Although the number of the utilized samples was initially limited, the inquiries for sample utilization are now increasing steadily from both research and clinical sources. Further efforts to increase the benefits of the repository are intended.
Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences.

PubMed

Rideout, Jai Ram; He, Yan; Navas-Molina, Jose A; Walters, William A; Ursell, Luke K; Gibbons, Sean M; Chase, John; McDonald, Daniel; Gonzalez, Antonio; Robbins-Pianka, Adam; Clemente, Jose C; Gilbert, Jack A; Huse, Susan M; Zhou, Hong-Wei; Knight, Rob; Caporaso, J Gregory

2014-01-01

We present a performance-optimized algorithm, subsampled open-reference OTU picking, for assigning marker gene (e.g., 16S rRNA) sequences generated on next-generation sequencing platforms to operational taxonomic units (OTUs) for microbial community analysis. This algorithm provides benefits over de novo OTU picking (clustering can be performed largely in parallel, reducing runtime) and closed-reference OTU picking (all reads are clustered, not only those that match a reference database sequence with high similarity). Because more of our algorithm can be run in parallel relative to "classic" open-reference OTU picking, it makes open-reference OTU picking tractable on massive amplicon sequence data sets (though on smaller data sets, "classic" open-reference OTU clustering is often faster). We illustrate that here by applying it to the first 15,000 samples sequenced for the Earth Microbiome Project (1.3 billion V4 16S rRNA amplicons). To the best of our knowledge, this is the largest OTU picking run ever performed, and we estimate that our new algorithm runs in less than 1/5 the time than would be required of "classic" open reference OTU picking. We show that subsampled open-reference OTU picking yields results that are highly correlated with those generated by "classic" open-reference OTU picking through comparisons on three well-studied datasets. An implementation of this algorithm is provided in the popular QIIME software package, which uses uclust for read clustering. All analyses were performed using QIIME's uclust wrappers, though we provide details (aided by the open-source code in our GitHub repository) that will allow implementation of subsampled open-reference OTU picking independently of QIIME (e.g., in a compiled programming language, where runtimes should be further reduced). Our analyses should generalize to other implementations of these OTU picking algorithms. Finally, we present a comparison of parameter settings in QIIME's OTU picking workflows and make recommendations on settings for these free parameters to optimize runtime without reducing the quality of the results. These optimized parameters can vastly decrease the runtime of uclust-based OTU picking in QIIME.
TCIA: An information resource to enable open science.

PubMed

Prior, Fred W; Clark, Ken; Commean, Paul; Freymann, John; Jaffe, Carl; Kirby, Justin; Moore, Stephen; Smith, Kirk; Tarbox, Lawrence; Vendt, Bruce; Marquez, Guillermo

2013-01-01

Reusable, publicly available data is a pillar of open science. The Cancer Imaging Archive (TCIA) is an open image archive service supporting cancer research. TCIA collects, de-identifies, curates and manages rich collections of oncology image data. Image data sets have been contributed by 28 institutions and additional image collections are underway. Since June of 2011, more than 2,000 users have registered to search and access data from this freely available resource. TCIA encourages and supports cancer-related open science communities by hosting and managing the image archive, providing project wiki space and searchable metadata repositories. The success of TCIA is measured by the number of active research projects it enables (>40) and the number of scientific publications and presentations that are produced using data from TCIA collections (39).
The Community WRF-Hydro Modeling System Version 4 Updates: Merging Toward Capabilities of the National Water Model

NASA Astrophysics Data System (ADS)

McAllister, M.; Gochis, D.; Dugger, A. L.; Karsten, L. R.; McCreight, J. L.; Pan, L.; Rafieeinasab, A.; Read, L. K.; Sampson, K. M.; Yu, W.

2017-12-01

The community WRF-Hydro modeling system is publicly available and provides researchers and operational forecasters a flexible and extensible capability for performing multi-scale, multi-physics options for hydrologic modeling that can be run independent or fully-interactive with the WRF atmospheric model. The core WRF-Hydro physics model contains very high-resolution descriptions of terrestrial hydrologic process representations such as land-atmosphere exchanges of energy and moisture, snowpack evolution, infiltration, terrain routing, channel routing, basic reservoir representation and hydrologic data assimilation. Complementing the core physics components of WRF-Hydro are an ecosystem of pre- and post-processing tools that facilitate the preparation of terrain and meteorological input data, an open-source hydrologic model evaluation toolset (Rwrfhydro), hydrologic data assimilation capabilities with DART and advanced model visualization capabilities. The National Center for Atmospheric Research (NCAR), through collaborative support from the National Science Foundation and other funding partners, provides community support for the entire WRF-Hydro system through a variety of mechanisms. This presentation summarizes the enhanced user support capabilities that are being developed for the community WRF-Hydro modeling system. These products and services include a new website, open-source code repositories, documentation and user guides, test cases, online training materials, live, hands-on training sessions, an email list serve, and individual user support via email through a new help desk ticketing system. The WRF-Hydro modeling system and supporting tools which now include re-gridding scripts and model calibration have recently been updated to Version 4 and are merging toward capabilities of the National Water Model.
Tourism impacts of Three Mile Island and other adverse events: Implications for Lincoln County and other rural counties bisected by radioactive wastes intended for Yucca Mountain

DOE Office of Scientific and Technical Information (OSTI.GOV)

Himmelberger, J.J.; Ogneva-Himmelberger, Y.A.; Baughman, M.

Whether the proposed Yucca Mountain nuclear waste repository system will adversely impact tourism in southern Nevada is an open question of particular importance to visitor-oriented rural counties bisected by planned waste transportation corridors (highway or rail). As part of one such county`s repository impact assessment program, tourism implications of Three Mile Island (TMI) and other major hazard events have been revisited to inform ongoing county-wide socioeconomic assessments and contingency planning efforts. This paper summarizes key research implications of such research as applied to Lincoln County, Nevada. Implications for other rural counties are discussed in light of the research findings. 29more » refs., 3 figs., 1 tab.« less
Implementing DSpace at NASA Langley Research Center

NASA Technical Reports Server (NTRS)

Lowe, Greta

2007-01-01

This presentation looks at the implementation of the DSpace institutional repository system at the NASA Langley Technical Library. NASA Langley Technical Library implemented DSpace software as a replacement for the Langley Technical Report Server (LTRS). DSpace was also used to develop the Langley Technical Library Digital Repository (LTLDR). LTLDR contains archival copies of core technical reports in the aeronautics area dating back to the NACA era and other specialized collections relevant to the NASA Langley community. Extensive metadata crosswalks were created to facilitate moving data from various systems and formats to DSpace. The Dublin Core metadata screens were also customized. The OpenURL standard and Ex Libris Metalib are being used in this environment to assist our customers with either discovering full-text content or with initiating a request for the item.
OpenMSI Arrayed Analysis Toolkit: Analyzing Spatially Defined Samples Using Mass Spectrometry Imaging

DOE Office of Scientific and Technical Information (OSTI.GOV)

de Raad, Markus; de Rond, Tristan; Rübel, Oliver

Mass spectrometry imaging (MSI) has primarily been applied in localizing biomolecules within biological matrices. Although well-suited, the application of MSI for comparing thousands of spatially defined spotted samples has been limited. One reason for this is a lack of suitable and accessible data processing tools for the analysis of large arrayed MSI sample sets. In this paper, the OpenMSI Arrayed Analysis Toolkit (OMAAT) is a software package that addresses the challenges of analyzing spatially defined samples in MSI data sets. OMAAT is written in Python and is integrated with OpenMSI (http://openmsi.nersc.gov), a platform for storing, sharing, and analyzing MSI data.more » By using a web-based python notebook (Jupyter), OMAAT is accessible to anyone without programming experience yet allows experienced users to leverage all features. OMAAT was evaluated by analyzing an MSI data set of a high-throughput glycoside hydrolase activity screen comprising 384 samples arrayed onto a NIMS surface at a 450 μm spacing, decreasing analysis time >100-fold while maintaining robust spot-finding. The utility of OMAAT was demonstrated for screening metabolic activities of different sized soil particles, including hydrolysis of sugars, revealing a pattern of size dependent activities. Finally, these results introduce OMAAT as an effective toolkit for analyzing spatially defined samples in MSI. OMAAT runs on all major operating systems, and the source code can be obtained from the following GitHub repository: https://github.com/biorack/omaat.« less
Rapid Deployment of a RESTful Service for Oceanographic Research Cruises

NASA Astrophysics Data System (ADS)

Fu, Linyun; Arko, Robert; Leadbetter, Adam

2014-05-01

The Ocean Data Interoperability Platform (ODIP) seeks to increase data sharing across scientific domains and international boundaries, by providing a forum to harmonize diverse regional data systems. ODIP participants from the US include the Rolling Deck to Repository (R2R) program, whose mission is to capture, catalog, and describe the underway/environmental sensor data from US oceanographic research vessels and submit the data to public long-term archives. R2R publishes information online as Linked Open Data, making it widely available using Semantic Web standards. Each vessel, sensor, cruise, dataset, person, organization, funding award, log, report, etc, has a Uniform Resource Identifier (URI). Complex queries that federate results from other data providers are supported, using the SPARQL query language. To facilitate interoperability, R2R uses controlled vocabularies developed collaboratively by the science community (eg. SeaDataNet device categories) and published online by the NERC Vocabulary Server (NVS). In response to user feedback, we are developing a standard programming interface (API) and Web portal for R2R's Linked Open Data. The API provides a set of simple REST-type URLs that are translated on-the-fly into SPARQL queries, and supports common output formats (eg. JSON). We will demonstrate an implementation based on the Epimorphics Linked Data API (ELDA) open-source Java package. Our experience shows that constructing a simple portal with limited schema elements in this way can significantly reduce development time and maintenance complexity.
RecutClub.com: An Open Source, Whole Slide Image-based Pathology Education System

PubMed Central

Christensen, Paul A.; Lee, Nathan E.; Thrall, Michael J.; Powell, Suzanne Z.; Chevez-Barrios, Patricia; Long, S. Wesley

2017-01-01

Background: Our institution's pathology unknown conferences provide educational cases for our residents. However, the cases have not been previously available digitally, have not been collated for postconference review, and were not accessible to a wider audience. Our objective was to create an inexpensive whole slide image (WSI) education suite to address these limitations and improve the education of pathology trainees. Materials and Methods: We surveyed residents regarding their preference between four unique WSI systems. We then scanned weekly unknown conference cases and study set cases and uploaded them to our custom built WSI viewer located at RecutClub.com. We measured site utilization and conference participation. Results: Residents preferred our OpenLayers WSI implementation to Ventana Virtuoso, Google Maps API, and OpenSlide. Over 16 months, we uploaded 1366 cases from 77 conferences and ten study sets, occupying 793.5 GB of cloud storage. Based on resident evaluations, the interface was easy to use and demonstrated minimal latency. Residents are able to review cases from home and from their mobile devices. Worldwide, 955 unique IP addresses from 52 countries have viewed cases in our site. Conclusions: We implemented a low-cost, publicly available repository of WSI slides for resident education. Our trainees are very satisfied with the freedom to preview either the glass slides or WSI and review the WSI postconference. Both local users and worldwide users actively and repeatedly view cases in our study set. PMID:28382224
Reconsolidated Salt as a Geotechnical Barrier

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hansen, Francis D.; Gadbury, Casey

Salt as a geologic medium has several attributes favorable to long-term isolation of waste placed in mined openings. Salt formations are largely impermeable and induced fractures heal as stress returns to equilibrium. Permanent isolation also depends upon the ability to construct geotechnical barriers that achieve nearly the same high-performance characteristics attributed to the native salt formation. Salt repository seal concepts often include elements of reconstituted granular salt. As a specific case in point, the Waste Isolation Pilot Plant recently received regulatory approval to change the disposal panel closure design from an engineered barrier constructed of a salt-based concrete to onemore » that employs simple run-of-mine salt and temporary bulkheads for isolation from ventilation. The Waste Isolation Pilot Plant is a radioactive waste disposal repository for defense-related transuranic elements mined from the Permian evaporite salt beds in southeast New Mexico. Its approved shaft seal design incorporates barrier components comprising salt-based concrete, bentonite, and substantial depths of crushed salt compacted to enhance reconsolidation. This paper will focus on crushed salt behavior when applied as drift closures to isolate disposal rooms during operations. Scientific aspects of salt reconsolidation have been studied extensively. The technical basis for geotechnical barrier performance has been strengthened by recent experimental findings and analogue comparisons. The panel closure change was accompanied by recognition that granular salt will return to a physical state similar to the halite surrounding it. Use of run-of-mine salt ensures physical and chemical compatibility with the repository environment and simplifies ongoing disposal operations. Our current knowledge and expected outcome of research can be assimilated with lessons learned to put forward designs and operational concepts for the next generation of salt repositories. Mined salt repositories have the potential to isolate permanently vast inventories of radioactive and hazardous wastes.« less
Digital Rocks Portal: a Sustainable Platform for Data Management, Analysis and Remote Visualization of Volumetric Images of Porous Media

NASA Astrophysics Data System (ADS)

Prodanovic, M.; Esteva, M.; Ketcham, R. A.

2017-12-01

Nanometer to centimeter-scale imaging such as (focused ion beam) scattered electron microscopy, magnetic resonance imaging and X-ray (micro)tomography has since 1990s introduced 2D and 3D datasets of rock microstructure that allow investigation of nonlinear flow and mechanical phenomena on the length scales that are otherwise impervious to laboratory measurements. The numerical approaches that use such images produce various upscaled parameters required by subsurface flow and deformation simulators. All of this has revolutionized our knowledge about grain scale phenomena. However, a lack of data-sharing infrastructure among research groups makes it difficult to integrate different length scales. We have developed a sustainable, open and easy-to-use repository called the Digital Rocks Portal (https://www.digitalrocksportal.org), that (1) organizes images and related experimental measurements of different porous materials, (2) improves access to them for a wider community of engineering or geosciences researchers not necessarily trained in computer science or data analysis. Digital Rocks Portal (NSF EarthCube Grant 1541008) is the first repository for imaged porous microstructure data. It is implemented within the reliable, 24/7 maintained High Performance Computing Infrastructure supported by the Texas Advanced Computing Center (University of Texas at Austin). Long-term storage is provided through the University of Texas System Research Cyber-infrastructure initiative. We show how the data can be documented, referenced in publications via digital object identifiers (see Figure below for examples), visualized, searched for and linked to other repositories. We show recently implemented integration of the remote parallel visualization, bulk upload for large datasets as well as preliminary flow simulation workflow with the pore structures currently stored in the repository. We discuss the issues of collecting correct metadata, data discoverability and repository sustainability.
Geoscience Digital Data Resource and Repository Service

NASA Astrophysics Data System (ADS)

Mayernik, M. S.; Schuster, D.; Hou, C. Y.

2017-12-01

The open availability and wide accessibility of digital data sets is becoming the norm for geoscience research. The National Science Foundation (NSF) instituted a data management planning requirement in 2011, and many scientific publishers, including the American Geophysical Union and the American Meteorological Society, have recently implemented data archiving and citation policies. Many disciplinary data facilities exist around the community to provide a high level of technical support and expertise for archiving data of particular kinds, or for particular projects. However, a significant number of geoscience research projects do not have the same level of data facility support due to a combination of several factors, including the research project's size, funding limitations, or topic scope that does not have a clear facility match. These projects typically manage data on an ad hoc basis without limited long-term management and preservation procedures. The NSF is supporting a workshop to be held in Summer of 2018 to develop requirements and expectations for a Geoscience Digital Data Resource and Repository Service (GeoDaRRS). The vision for the prospective GeoDaRRS is to complement existing NSF-funded data facilities by providing: 1) data management planning support resources for the general community, and 2) repository services for researchers who have data that do not fit in any existing repository. Functionally, the GeoDaRRS would support NSF-funded researchers in meeting data archiving requirements set by the NSF and publishers for geosciences, thereby ensuring the availability of digital data for use and reuse in scientific research going forward. This presentation will engage the AGU community in discussion about the needs for a new digital data repository service, specifically to inform the forthcoming GeoDaRRS workshop.

Molecular hydrogen: An abundant energy source for bacterial activity in nuclear waste repositories

NASA Astrophysics Data System (ADS)

Libert, M.; Bildstein, O.; Esnault, L.; Jullien, M.; Sellier, R.

A thorough understanding of the energy sources used by microbial systems in the deep terrestrial subsurface is essential since the extreme conditions for life in deep biospheres may serve as a model for possible life in a nuclear waste repository. In this respect, H 2 is known as one of the most energetic substrates for deep terrestrial subsurface environments. This hydrogen is produced from abiotic and biotic processes but its concentration in natural systems is usually maintained at very low levels due to hydrogen-consuming bacteria. A significant amount of H 2 gas will be produced within deep nuclear waste repositories, essentially from the corrosion of metallic components. This will consequently improve the conditions for microbial activity in this specific environment. This paper discusses different study cases with experimental results to illustrate the fact that microorganisms are able to use hydrogen for redox processes (reduction of O 2, NO3-, Fe III) in several waste disposal conditions. Consequences of microbial activity include: alteration of groundwater chemistry and shift in geochemical equilibria, gas production or consumption, biocorrosion, and potential modifications of confinement properties. In order to quantify the impact of hydrogen bacteria, the next step will be to determine the kinetic rate of the reactions in realistic conditions.
Seismic stability of the survey areas of potential sites for the deep geological repository of the spent nuclear fuel

NASA Astrophysics Data System (ADS)

Kaláb, Zdeněk; Šílený, Jan; Lednická, Markéta

2017-07-01

This paper deals with the seismic stability of the survey areas of potential sites for the deep geological repository of the spent nuclear fuel in the Czech Republic. The basic source of data for historical earthquakes up to 1990 was the seismic website [1-]. The most intense earthquake described occurred on September 15, 1590 in the Niederroesterreich region (Austria) in the historical period; its reported intensity is Io = 8-9. The source of the contemporary seismic data for the period since 1991 to the end of 2014 was the website [11]. It may be stated based on the databases and literature review that in the period from 1900, no earthquake exceeding magnitude 5.1 originated in the territory of the Czech Republic. In order to evaluate seismicity and to assess the impact of seismic effects at depths of hypothetical deep geological repository for the next time period, the neo-deterministic method was selected as an extension of the probabilistic method. Each one out of the seven survey areas were assessed by the neo-deterministic evaluation of the seismic wave-field excited by selected individual events and determining the maximum loading. Results of seismological databases studies and neo-deterministic analysis of Čihadlo locality are presented.
How to Recognize and Avoid Potential, Possible, or Probable Predatory Open-Access Publishers, Standalone, and Hijacked Journals.

PubMed

Danevska, Lenche; Spiroski, Mirko; Donev, Doncho; Pop-Jordanova, Nada; Polenakovic, Momir

2016-11-01

The Internet has enabled an easy method to search through the vast majority of publications and has improved the impact of scholarly journals. However, it can also pose threats to the quality of published articles. New publishers and journals have emerged so-called open-access potential, possible, or probable predatory publishers and journals, and so-called hijacked journals. It was our aim to increase the awareness and warn scholars, especially young researchers, how to recognize these journals and how to avoid submission of their papers to these journals. Review and critical analysis of the relevant published literature, Internet sources and personal experience, thoughts, and observations of the authors. The web blog of Jeffrey Beall, University of Colorado, was greatly consulted. Jeffrey Beall is a Denver academic librarian who regularly maintains two lists: the first one, of potential, possible, or probable predatory publishers and the second one, of potential, possible, or probable predatory standalone journals. Aspects related to this topic presented by other authors have been discussed as well. Academics should bear in mind how to differentiate between trustworthy and reliable journals and predatory ones, considering: publication ethics, peer-review process, international academic standards, indexing and abstracting, preservation in digital repositories, metrics, sustainability, etc.
The Five 'R's' for Developing Trusted Software Frameworks to increase confidence in, and maximise reuse of, Open Source Software.

NASA Astrophysics Data System (ADS)

Fraser, Ryan; Gross, Lutz; Wyborn, Lesley; Evans, Ben; Klump, Jens

2015-04-01

Recent investments in HPC, cloud and Petascale data stores, have dramatically increased the scale and resolution that earth science challenges can now be tackled. These new infrastructures are highly parallelised and to fully utilise them and access the large volumes of earth science data now available, a new approach to software stack engineering needs to be developed. The size, complexity and cost of the new infrastructures mean any software deployed has to be reliable, trusted and reusable. Increasingly software is available via open source repositories, but these usually only enable code to be discovered and downloaded. As a user it is hard for a scientist to judge the suitability and quality of individual codes: rarely is there information on how and where codes can be run, what the critical dependencies are, and in particular, on the version requirements and licensing of the underlying software stack. A trusted software framework is proposed to enable reliable software to be discovered, accessed and then deployed on multiple hardware environments. More specifically, this framework will enable those who generate the software, and those who fund the development of software, to gain credit for the effort, IP, time and dollars spent, and facilitate quantification of the impact of individual codes. For scientific users, the framework delivers reviewed and benchmarked scientific software with mechanisms to reproduce results. The trusted framework will have five separate, but connected components: Register, Review, Reference, Run, and Repeat. 1) The Register component will facilitate discovery of relevant software from multiple open source code repositories. The registration process of the code should include information about licensing, hardware environments it can be run on, define appropriate validation (testing) procedures and list the critical dependencies. 2) The Review component is targeting on the verification of the software typically against a set of benchmark cases. This will be achieved by linking the code in the software framework to peer review forums such as Mozilla Science or appropriate Journals (e.g. Geoscientific Model Development Journal) to assist users to know which codes to trust. 3) Referencing will be accomplished by linking the Software Framework to groups such as Figshare or ImpactStory that help disseminate and measure the impact of scientific research, including program code. 4) The Run component will draw on information supplied in the registration process, benchmark cases described in the review and relevant information to instantiate the scientific code on the selected environment. 5) The Repeat component will tap into existing Provenance Workflow engines that will automatically capture information that relate to a particular run of that software, including identification of all input and output artefacts, and all elements and transactions within that workflow. The proposed trusted software framework will enable users to rapidly discover and access reliable code, reduce the time to deploy it and greatly facilitate sharing, reuse and reinstallation of code. Properly designed it could enable an ability to scale out to massively parallel systems and be accessed nationally/ internationally for multiple use cases, including Supercomputer centres, cloud facilities, and local computers.
A vision and strategy for the virtual physiological human in 2010 and beyond.

PubMed

Hunter, Peter; Coveney, Peter V; de Bono, Bernard; Diaz, Vanessa; Fenner, John; Frangi, Alejandro F; Harris, Peter; Hose, Rod; Kohl, Peter; Lawford, Pat; McCormack, Keith; Mendes, Miriam; Omholt, Stig; Quarteroni, Alfio; Skår, John; Tegner, Jesper; Randall Thomas, S; Tollis, Ioannis; Tsamardinos, Ioannis; van Beek, Johannes H G M; Viceconti, Marco

2010-06-13

European funding under framework 7 (FP7) for the virtual physiological human (VPH) project has been in place now for nearly 2 years. The VPH network of excellence (NoE) is helping in the development of common standards, open-source software, freely accessible data and model repositories, and various training and dissemination activities for the project. It is also helping to coordinate the many clinically targeted projects that have been funded under the FP7 calls. An initial vision for the VPH was defined by framework 6 strategy for a European physiome (STEP) project in 2006. It is now time to assess the accomplishments of the last 2 years and update the STEP vision for the VPH. We consider the biomedical science, healthcare and information and communications technology challenges facing the project and we propose the VPH Institute as a means of sustaining the vision of VPH beyond the time frame of the NoE.
Using the CPTAC Assay Portal to identify and implement highly characterized targeted proteomics assays

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whiteaker, Jeffrey R.; Halusa, Goran; Hoofnagle, Andrew N.

2016-02-12

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of the National Cancer Institute (NCI) has launched an Assay Portal (http://assays.cancer.gov) to serve as an open-source repository of well-characterized targeted proteomic assays. The portal is designed to curate and disseminate highly characterized, targeted mass spectrometry (MS)-based assays by providing detailed assay performance characterization data, standard operating procedures, and access to reagents. Assay content is accessed via the portal through queries to find assays targeting proteins associated with specific cellular pathways, protein complexes, or specific chromosomal regions. The position of the peptide analytes for which there are available assays are mapped relative tomore » other features of interest in the protein, such as sequence domains, isoforms, single nucleotide polymorphisms, and post-translational modifications. The overarching goals are to enable robust quantification of all human proteins and to standardize the quantification of targeted MS-based assays to ultimately enable harmonization of results over time and across laboratories.« less
DBATE: database of alternative transcripts expression.

PubMed

Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2013-01-01

The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
A python framework for environmental model uncertainty analysis

USGS Publications Warehouse

White, Jeremy; Fienen, Michael N.; Doherty, John E.

2016-01-01

We have developed pyEMU, a python framework for Environmental Modeling Uncertainty analyses, open-source tool that is non-intrusive, easy-to-use, computationally efficient, and scalable to highly-parameterized inverse problems. The framework implements several types of linear (first-order, second-moment (FOSM)) and non-linear uncertainty analyses. The FOSM-based analyses can also be completed prior to parameter estimation to help inform important modeling decisions, such as parameterization and objective function formulation. Complete workflows for several types of FOSM-based and non-linear analyses are documented in example notebooks implemented using Jupyter that are available in the online pyEMU repository. Example workflows include basic parameter and forecast analyses, data worth analyses, and error-variance analyses, as well as usage of parameter ensemble generation and management capabilities. These workflows document the necessary steps and provides insights into the results, with the goal of educating users not only in how to apply pyEMU, but also in the underlying theory of applied uncertainty quantification.
A vision and strategy for the virtual physiological human in 2010 and beyond

PubMed Central

Hunter, Peter; Coveney, Peter V.; de Bono, Bernard; Diaz, Vanessa; Fenner, John; Frangi, Alejandro F.; Harris, Peter; Hose, Rod; Kohl, Peter; Lawford, Pat; McCormack, Keith; Mendes, Miriam; Omholt, Stig; Quarteroni, Alfio; Skår, John; Tegner, Jesper; Randall Thomas, S.; Tollis, Ioannis; Tsamardinos, Ioannis; van Beek, Johannes H. G. M.; Viceconti, Marco

2010-01-01

European funding under framework 7 (FP7) for the virtual physiological human (VPH) project has been in place now for nearly 2 years. The VPH network of excellence (NoE) is helping in the development of common standards, open-source software, freely accessible data and model repositories, and various training and dissemination activities for the project. It is also helping to coordinate the many clinically targeted projects that have been funded under the FP7 calls. An initial vision for the VPH was defined by framework 6 strategy for a European physiome (STEP) project in 2006. It is now time to assess the accomplishments of the last 2 years and update the STEP vision for the VPH. We consider the biomedical science, healthcare and information and communications technology challenges facing the project and we propose the VPH Institute as a means of sustaining the vision of VPH beyond the time frame of the NoE. PMID:20439264
The Image Data Resource: A Bioimage Data Integration and Publication Platform.

PubMed

Williams, Eleanor; Moore, Josh; Li, Simon W; Rustici, Gabriella; Tarkowska, Aleksandra; Chessel, Anatole; Leo, Simone; Antal, Bálint; Ferguson, Richard K; Sarkans, Ugis; Brazma, Alvis; Salas, Rafael E Carazo; Swedlow, Jason R

2017-08-01

Access to primary research data is vital for the advancement of science. To extend the data types supported by community repositories, we built a prototype Image Data Resource (IDR) that collects and integrates imaging data acquired across many different imaging modalities. IDR links data from several imaging modalities, including high-content screening, super-resolution and time-lapse microscopy, digital pathology, public genetic or chemical databases, and cell and tissue phenotypes expressed using controlled ontologies. Using this integration, IDR facilitates the analysis of gene networks and reveals functional interactions that are inaccessible to individual studies. To enable re-analysis, we also established a computational resource based on Jupyter notebooks that allows remote access to the entire IDR. IDR is also an open source platform that others can use to publish their own image data. Thus IDR provides both a novel on-line resource and a software infrastructure that promotes and extends publication and re-analysis of scientific image data.
Using the CPTAC Assay Portal to Identify and Implement Highly Characterized Targeted Proteomics Assays.

PubMed

Whiteaker, Jeffrey R; Halusa, Goran N; Hoofnagle, Andrew N; Sharma, Vagisha; MacLean, Brendan; Yan, Ping; Wrobel, John A; Kennedy, Jacob; Mani, D R; Zimmerman, Lisa J; Meyer, Matthew R; Mesri, Mehdi; Boja, Emily; Carr, Steven A; Chan, Daniel W; Chen, Xian; Chen, Jing; Davies, Sherri R; Ellis, Matthew J C; Fenyö, David; Hiltke, Tara; Ketchum, Karen A; Kinsinger, Chris; Kuhn, Eric; Liebler, Daniel C; Liu, Tao; Loss, Michael; MacCoss, Michael J; Qian, Wei-Jun; Rivers, Robert; Rodland, Karin D; Ruggles, Kelly V; Scott, Mitchell G; Smith, Richard D; Thomas, Stefani; Townsend, R Reid; Whiteley, Gordon; Wu, Chaochao; Zhang, Hui; Zhang, Zhen; Rodriguez, Henry; Paulovich, Amanda G

2016-01-01

The Clinical Proteomic Tumor Analysis Consortium (CPTAC) of the National Cancer Institute (NCI) has launched an Assay Portal (http://assays.cancer.gov) to serve as an open-source repository of well-characterized targeted proteomic assays. The portal is designed to curate and disseminate highly characterized, targeted mass spectrometry (MS)-based assays by providing detailed assay performance characterization data, standard operating procedures, and access to reagents. Assay content is accessed via the portal through queries to find assays targeting proteins associated with specific cellular pathways, protein complexes, or specific chromosomal regions. The position of the peptide analytes for which there are available assays are mapped relative to other features of interest in the protein, such as sequence domains, isoforms, single nucleotide polymorphisms, and posttranslational modifications. The overarching goals are to enable robust quantification of all human proteins and to standardize the quantification of targeted MS-based assays to ultimately enable harmonization of results over time and across laboratories.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Soldevilla, M.; Salmons, S.; Espinosa, B.

The new application BDDR (Reactor database) has been developed at CEA in order to manage nuclear reactors technological and operating data. This application is a knowledge management tool which meets several internal needs: -) to facilitate scenario studies for any set of reactors, e.g. non-proliferation assessments; -) to make core physics studies easier, whatever the reactor design (PWR-Pressurized Water Reactor-, BWR-Boiling Water Reactor-, MAGNOX- Magnesium Oxide reactor-, CANDU - CANada Deuterium Uranium-, FBR - Fast Breeder Reactor -, etc.); -) to preserve the technological data of all reactors (past and present, power generating or experimental, naval propulsion,...) in a uniquemore » repository. Within the application database are enclosed location data and operating history data as well as a tree-like structure containing numerous technological data. These data address all kinds of reactors features and components. A few neutronics data are also included (neutrons fluxes). The BDDR application is based on open-source technologies and thin client/server architecture. The software architecture has been made flexible enough to allow for any change. (authors)« less
An open real-time tele-stethoscopy system.

PubMed

Foche-Perez, Ignacio; Ramirez-Payba, Rodolfo; Hirigoyen-Emparanza, German; Balducci-Gonzalez, Fernando; Simo-Reigadas, Francisco-Javier; Seoane-Pascual, Joaquin; Corral-Peñafiel, Jaime; Martinez-Fernandez, Andres

2012-08-23

Acute respiratory infections are the leading cause of childhood mortality. The lack of physicians in rural areas of developing countries makes difficult their correct diagnosis and treatment. The staff of rural health facilities (health-care technicians) may not be qualified to distinguish respiratory diseases by auscultation. For this reason, the goal of this project is the development of a tele-stethoscopy system that allows a physician to receive real-time cardio-respiratory sounds from a remote auscultation, as well as video images showing where the technician is placing the stethoscope on the patient's body. A real-time wireless stethoscopy system was designed. The initial requirements were: 1) The system must send audio and video synchronously over IP networks, not requiring an Internet connection; 2) It must preserve the quality of cardiorespiratory sounds, allowing to adapt the binaural pieces and the chestpiece of standard stethoscopes, and; 3) Cardiorespiratory sounds should be recordable at both sides of the communication. In order to verify the diagnostic capacity of the system, a clinical validation with eight specialists has been designed. In a preliminary test, twelve patients have been auscultated by all the physicians using the tele-stethoscopy system, versus a local auscultation using traditional stethoscope. The system must allow listen the cardiac (systolic and diastolic murmurs, gallop sound, arrhythmias) and respiratory (rhonchi, rales and crepitations, wheeze, diminished and bronchial breath sounds, pleural friction rub) sounds. The design, development and initial validation of the real-time wireless tele-stethoscopy system are described in detail. The system was conceived from scratch as open-source, low-cost and designed in such a way that many universities and small local companies in developing countries may manufacture it. Only free open-source software has been used in order to minimize manufacturing costs and look for alliances to support its improvement and adaptation. The microcontroller firmware code, the computer software code and the PCB schematics are available for free download in a subversion repository hosted in SourceForge. It has been shown that real-time tele-stethoscopy, together with a videoconference system that allows a remote specialist to oversee the auscultation, may be a very helpful tool in rural areas of developing countries.
Perspectives in understanding open access to research data - infrastructure and technology challenges

NASA Astrophysics Data System (ADS)

Bigagli, Lorenzo; Sondervan, Jeroen

2014-05-01

The Policy RECommendations for Open Access to Research Data in Europe (RECODE) project, started in February 2013 with a duration of two years, has the objective to identify a series of targeted and over-arching policy recommendations for Open Access to European research data, based on existing good practice and addressing such hindering factors as stakeholder fragmentation, technical and infrastructural issues, ethical and legal issues, and financial and institutional policies. In this work we focus on the technical and infrastructural aspect, where by "infrastructure" we mean the technological assets (hardware and software), the human resources, and all the policies, processes, procedures and training for managing and supporting its continuous operation and evolution. The context targeted by RECODE includes heterogeneous networks, initiatives, projects and communities that are fragmented by discipline, geography, stakeholder category (publishers, academics, repositories, etc.) as well as other boundaries. Many of these organizations are already addressing key technical and infrastructural barriers to Open Access to research data. Such barriers may include: lack of automatic mechanisms for policy enforcement, lack of metadata and data models supporting open access, obsolescence of infrastructures, scarce awareness about new technological solutions, lack of training and/or expertise on IT and semantics aspects. However, these organizations are often heterogeneous and fragmented by discipline, geography, stakeholder category (publishers, academics, repositories, etc.) as well as other boundaries, and often work in isolation, or with limited contact with one another. RECODE has addressed these challenges, and the possible solutions to mitigate them, engaging all the identified stakeholders in a number of ways, including an online questionnaire, case studies interviews, literature review, a workshop. The conclusions have been validated by the RECODE Advisory Board and will contribute to shape the RECODE policy guidelines for Open Access to Research Data. In the work, we report on the identified technological and infrastructural issues, classified according to the barriers of heterogeneity, sustainability, volume, quality, and security.
PathVisio 3: an extendable pathway analysis toolbox.

PubMed

Kutmon, Martina; van Iersel, Martijn P; Bohler, Anwesha; Kelder, Thomas; Nunes, Nuno; Pico, Alexander R; Evelo, Chris T

2015-02-01

PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways. Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application. PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
Sequencing Data Discovery and Integration for Earth System Science with MetaSeek

NASA Astrophysics Data System (ADS)

Hoarfrost, A.; Brown, N.; Arnosti, C.

2017-12-01

Microbial communities play a central role in biogeochemical cycles. Sequencing data resources from environmental sources have grown exponentially in recent years, and represent a singular opportunity to investigate microbial interactions with Earth system processes. Carrying out such meta-analyses depends on our ability to discover and curate sequencing data into large-scale integrated datasets. However, such integration efforts are currently challenging and time-consuming, with sequencing data scattered across multiple repositories and metadata that is not easily or comprehensively searchable. MetaSeek is a sequencing data discovery tool that integrates sequencing metadata from all the major data repositories, allowing the user to search and filter on datasets in a lightweight application with an intuitive, easy-to-use web-based interface. Users can save and share curated datasets, while other users can browse these data integrations or use them as a jumping off point for their own curation. Missing and/or erroneous metadata are inferred automatically where possible, and where not possible, users are prompted to contribute to the improvement of the sequencing metadata pool by correcting and amending metadata errors. Once an integrated dataset has been curated, users can follow simple instructions to download their raw data and quickly begin their investigations. In addition to the online interface, the MetaSeek database is easily queryable via an open API, further enabling users and facilitating integrations of MetaSeek with other data curation tools. This tool lowers the barriers to curation and integration of environmental sequencing data, clearing the path forward to illuminating the ecosystem-scale interactions between biological and abiotic processes.
Researcher-library collaborations: Data repositories as a service for researchers.

PubMed

Gordon, Andrew S; Millman, David S; Steiger, Lisa; Adolph, Karen E; Gilmore, Rick O

New interest has arisen in organizing, preserving, and sharing the raw materials-the data and metadata-that undergird the published products of research. Library and information scientists have valuable expertise to bring to bear in the effort to create larger, more diverse, and more widely used data repositories. However, for libraries to be maximally successful in providing the research data management and preservation services required of a successful data repository, librarians must work closely with researchers and learn about their data management workflows. Databrary is a data repository that is closely linked to the needs of a specific scholarly community-researchers who use video as a main source of data to study child development and learning. The project's success to date is a result of its focus on community outreach and providing services for scholarly communication, engaging institutional partners, offering services for data curation with the guidance of closely involved information professionals, and the creation of a strong technical infrastructure. Databrary plans to improve its curation tools that allow researchers to deposit their own data, enhance the user-facing feature set, increase integration with library systems, and implement strategies for long-term sustainability.
The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation.

PubMed

Kocher, Jean-Pierre A; Quest, Daniel J; Duffy, Patrick; Meiners, Michael A; Moore, Raymond M; Rider, David; Hossain, Asif; Hart, Steven N; Dinu, Valentin

2014-07-01

The Biological Reference Repository (BioR) is a toolkit for annotating variants. BioR stores public and user-specific annotation sources in indexed JSON-encoded flat files (catalogs). The BioR toolkit provides the functionality to combine and retrieve annotation from these catalogs via the command-line interface. Several catalogs from commonly used annotation sources and instructions for creating user-specific catalogs are provided. Commands from the toolkit can be combined with other UNIX commands for advanced annotation processing. We also provide instructions for the development of custom annotation pipelines. The package is implemented in Java and makes use of external tools written in Java and Perl. The toolkit can be executed on Mac OS X 10.5 and above or any Linux distribution. The BioR application, quickstart, and user guide documents and many biological examples are available at http://bioinformaticstools.mayo.edu. © The Author 2014. Published by Oxford University Press.
49 CFR Appendix B to Part 564 - Information To Be Submitted for Long Life Replaceable Light Sources of Limited Definition

Code of Federal Regulations, 2010 CFR

2010-10-01

... light sources used in motor vehicle headlighting systems. This part also serves as a repository for... standardized sealed beam units used in motor vehicle headlighting systems. § 564.2 Purposes. The purposes of... manufacturing specifications of standardized sealed beam headlamp units used on motor vehicles so that all...
An Architecture Based on Linked Data Technologies for the Integration and Reuse of OER in MOOCs Context

ERIC Educational Resources Information Center

Piedra, Nelson; Chicaiza, Janneth Alexandra; López, Jorge; Tovar, Edmundo

2014-01-01

The Linked Data initiative is considered as one of the most effective alternatives for creating global shared information spaces, it has become an interesting approach for discovering and enriching open educational resources data, as well as achieving semantic interoperability and re-use between multiple OER repositories. The notion of Linked Data…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.