Automating RPM Creation from a Source Code Repository
2012-02-01
apps/usr --with- libpq=/apps/ postgres make rm -rf $RPM_BUILD_ROOT umask 0077 mkdir -p $RPM_BUILD_ROOT/usr/local/bin mkdir -p $RPM_BUILD_ROOT...from a source code repository. %pre %prep %setup %build ./autogen.sh ; ./configure --with-db=/apps/db --with-libpq=/apps/ postgres make
NASA Technical Reports Server (NTRS)
Teubert, Christopher; Sankararaman, Shankar; Cullo, Aiden
2017-01-01
Readme for the Random Variable Toolbox usable manner. is a Web-based Git version control repository hosting service. It is mostly used for computer code. It offers all of the distributed version control and source code management (SCM) functionality of Git as well as adding its own features. It provides access control and several collaboration features such as bug tracking, feature requests, task management, and wikis for every project.[3] GitHub offers both plans for private and free repositories on the same account[4] which are commonly used to host open-source software projects.[5] As of April 2017, GitHub reports having almost 20 million users and 57 million repositories,[6] making it the largest host of source code in the world.[7] GitHub has a mascot called Octocat, a cat with five tentacles and a human-like face
NASA Astrophysics Data System (ADS)
Butov, R. A.; Drobyshevsky, N. I.; Moiseenko, E. V.; Tokarev, U. N.
2017-11-01
The verification of the FENIA finite element code on some problems and an example of its application are presented in the paper. The code is being developing for 3D modelling of thermal, mechanical and hydrodynamical (THM) problems related to the functioning of deep geological repositories. Verification of the code for two analytical problems has been performed. The first one is point heat source with exponential heat decrease, the second one - linear heat source with similar behavior. Analytical solutions have been obtained by the authors. The problems have been chosen because they reflect the processes influencing the thermal state of deep geological repository of radioactive waste. Verification was performed for several meshes with different resolution. Good convergence between analytical and numerical solutions was achieved. The application of the FENIA code is illustrated by 3D modelling of thermal state of a prototypic deep geological repository of radioactive waste. The repository is designed for disposal of radioactive waste in a rock at depth of several hundred meters with no intention of later retrieval. Vitrified radioactive waste is placed in the containers, which are placed in vertical boreholes. The residual decay heat of radioactive waste leads to containers, engineered safety barriers and host rock heating. Maximum temperatures and corresponding times of their establishment have been determined.
Zhang, Melvyn W B; Ho, Roger C M
2017-01-01
Dementia is known to be an illness which brings forth marked disability amongst the elderly individuals. At times, patients living with dementia do also experience non-cognitive symptoms, and these symptoms include that of hallucinations, delusional beliefs as well as emotional liability, sexualized behaviours and aggression. According to the National Institute of Clinical Excellence (NICE) guidelines, non-pharmacological techniques are typically the first-line option prior to the consideration of adjuvant pharmacological options. Reminiscence and music therapy are thus viable options. Lazar et al. [3] previously performed a systematic review with regards to the utilization of technology to delivery reminiscence based therapy to individuals who are living with dementia and has highlighted that technology does have benefits in the delivery of reminiscence therapy. However, to date, there has been a paucity of M-health innovations in this area. In addition, most of the current innovations are not personalized for each of the person living with Dementia. Prior research has highlighted the utility for open source repository in bioinformatics study. The authors hoped to explain how they managed to tap upon and make use of open source repository in the development of a personalized M-health reminiscence therapy innovation for patients living with dementia. The availability of open source code repository has changed the way healthcare professionals and developers develop smartphone applications today. Conventionally, a long iterative process is needed in the development of native application, mainly because of the need for native programming and coding, especially so if the application needs to have interactive features or features that could be personalized. Such repository enables the rapid and cost effective development of application. Moreover, developers are also able to further innovate, as less time is spend in the iterative process.
The Astrophysics Source Code Library: Where Do We Go from Here?
NASA Astrophysics Data System (ADS)
Allen, A.; Berriman, B.; DuPrie, K.; Hanisch, R. J.; Mink, J.; Nemiroff, R. J.; Shamir, L.; Shortridge, K.; Taylor, M. B.; Teuben, P.; Wallen, J.
2014-05-01
The Astrophysics Source Code Library1, started in 1999, has in the past three years grown from a repository for 40 codes to a registry of over 700 codes that are now indexed by ADS. What comes next? We examine the future of the , the challenges facing it, the rationale behind its practices, and the need to balance what we might do with what we have the resources to accomplish.
SeisCode: A seismological software repository for discovery and collaboration
NASA Astrophysics Data System (ADS)
Trabant, C.; Reyes, C. G.; Clark, A.; Karstens, R.
2012-12-01
SeisCode is a community repository for software used in seismological and related fields. The repository is intended to increase discoverability of such software and to provide a long-term home for software projects. Other places exist where seismological software may be found, but none meet the requirements necessary for an always current, easy to search, well documented, and citable resource for projects. Organizations such as IRIS, ORFEUS, and the USGS have websites with lists of available or contributed seismological software. Since the authors themselves do often not maintain these lists, the documentation often consists of a sentence or paragraph, and the available software may be outdated. Repositories such as GoogleCode and SourceForge, which are directly maintained by the authors, provide version control and issue tracking but do not provide a unified way of locating geophysical software scattered in and among countless unrelated projects. Additionally, projects are hosted at language-specific sites such as Mathworks and PyPI, in FTP directories, and in websites strewn across the Web. Search engines are only partially effective discovery tools, as the desired software is often hidden deep within the results. SeisCode provides software authors a place to present their software, codes, scripts, tutorials, and examples to the seismological community. Authors can choose their own level of involvement. At one end of the spectrum, the author might simply create a web page that points to an existing site. At the other extreme, an author may choose to leverage the many tools provided by SeisCode, such as a source code management tool with integrated issue tracking, forums, news feeds, downloads, wikis, and more. For software development projects with multiple authors, SeisCode can also be used as a central site for collaboration. SeisCode provides the community with an easy way to discover software, while providing authors a way to build a community around their software packages. IRIS invites the seismological community to browse and to submit projects to https://seiscode.iris.washington.edu/
Springate, David A; Kontopantelis, Evangelos; Ashcroft, Darren M; Olier, Ivan; Parisi, Rosa; Chamapiwa, Edmore; Reeves, David
2014-01-01
Lists of clinical codes are the foundation for research undertaken using electronic medical records (EMRs). If clinical code lists are not available, reviewers are unable to determine the validity of research, full study replication is impossible, researchers are unable to make effective comparisons between studies, and the construction of new code lists is subject to much duplication of effort. Despite this, the publication of clinical codes is rarely if ever a requirement for obtaining grants, validating protocols, or publishing research. In a representative sample of 450 EMR primary research articles indexed on PubMed, we found that only 19 (5.1%) were accompanied by a full set of published clinical codes and 32 (8.6%) stated that code lists were available on request. To help address these problems, we have built an online repository where researchers using EMRs can upload and download lists of clinical codes. The repository will enable clinical researchers to better validate EMR studies, build on previous code lists and compare disease definitions across studies. It will also assist health informaticians in replicating database studies, tracking changes in disease definitions or clinical coding practice through time and sharing clinical code information across platforms and data sources as research objects.
Springate, David A.; Kontopantelis, Evangelos; Ashcroft, Darren M.; Olier, Ivan; Parisi, Rosa; Chamapiwa, Edmore; Reeves, David
2014-01-01
Lists of clinical codes are the foundation for research undertaken using electronic medical records (EMRs). If clinical code lists are not available, reviewers are unable to determine the validity of research, full study replication is impossible, researchers are unable to make effective comparisons between studies, and the construction of new code lists is subject to much duplication of effort. Despite this, the publication of clinical codes is rarely if ever a requirement for obtaining grants, validating protocols, or publishing research. In a representative sample of 450 EMR primary research articles indexed on PubMed, we found that only 19 (5.1%) were accompanied by a full set of published clinical codes and 32 (8.6%) stated that code lists were available on request. To help address these problems, we have built an online repository where researchers using EMRs can upload and download lists of clinical codes. The repository will enable clinical researchers to better validate EMR studies, build on previous code lists and compare disease definitions across studies. It will also assist health informaticians in replicating database studies, tracking changes in disease definitions or clinical coding practice through time and sharing clinical code information across platforms and data sources as research objects. PMID:24941260
The MIMIC Code Repository: enabling reproducibility in critical care research.
Johnson, Alistair Ew; Stone, David J; Celi, Leo A; Pollard, Tom J
2018-01-01
Lack of reproducibility in medical studies is a barrier to the generation of a robust knowledge base to support clinical decision-making. In this paper we outline the Medical Information Mart for Intensive Care (MIMIC) Code Repository, a centralized code base for generating reproducible studies on an openly available critical care dataset. Code is provided to load the data into a relational structure, create extractions of the data, and reproduce entire analysis plans including research studies. Concepts extracted include severity of illness scores, comorbid status, administrative definitions of sepsis, physiologic criteria for sepsis, organ failure scores, treatment administration, and more. Executable documents are used for tutorials and reproduce published studies end-to-end, providing a template for future researchers to replicate. The repository's issue tracker enables community discussion about the data and concepts, allowing users to collaboratively improve the resource. The centralized repository provides a platform for users of the data to interact directly with the data generators, facilitating greater understanding of the data. It also provides a location for the community to collaborate on necessary concepts for research progress and share them with a larger audience. Consistent application of the same code for underlying concepts is a key step in ensuring that research studies on the MIMIC database are comparable and reproducible. By providing open source code alongside the freely accessible MIMIC-III database, we enable end-to-end reproducible analysis of electronic health records. © The Author 2017. Published by Oxford University Press on behalf of the American Medical Informatics Association.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ditmars, J.D.; Walbridge, E.W.; Rote, D.M.
1983-10-01
Repository performance assessment is analysis that identifies events and processes that might affect a repository system for isolation of radioactive waste, examines their effects on barriers to waste migration, and estimates the probabilities of their occurrence and their consequences. In 1983 Battelle Memorial Institute's Office of Nuclear Waste Isolation (ONWI) prepared two plans - one for performance assessment for a waste repository in salt and one for verification and validation of performance assessment technology. At the request of the US Department of Energy's Salt Repository Project Office (SRPO), Argonne National Laboratory reviewed those plans and prepared this report to advisemore » SRPO of specific areas where ONWI's plans for performance assessment might be improved. This report presents a framework for repository performance assessment that clearly identifies the relationships among the disposal problems, the processes underlying the problems, the tools for assessment (computer codes), and the data. In particular, the relationships among important processes and 26 model codes available to ONWI are indicated. A common suggestion for computer code verification and validation is the need for specific and unambiguous documentation of the results of performance assessment activities. A major portion of this report consists of status summaries of 27 model codes indicated as potentially useful by ONWI. The code summaries focus on three main areas: (1) the code's purpose, capabilities, and limitations; (2) status of the elements of documentation and review essential for code verification and validation; and (3) proposed application of the code for performance assessment of salt repository systems. 15 references, 6 figures, 4 tables.« less
A Repository of Codes of Ethics and Technical Standards in Health Informatics
Zaïane, Osmar R.
2014-01-01
We present a searchable repository of codes of ethics and standards in health informatics. It is built using state-of-the-art search algorithms and technologies. The repository will be potentially beneficial for public health practitioners, researchers, and software developers in finding and comparing ethics topics of interest. Public health clinics, clinicians, and researchers can use the repository platform as a one-stop reference for various ethics codes and standards. In addition, the repository interface is built for easy navigation, fast search, and side-by-side comparative reading of documents. Our selection criteria for codes and standards are two-fold; firstly, to maintain intellectual property rights, we index only codes and standards freely available on the internet. Secondly, major international, regional, and national health informatics bodies across the globe are surveyed with the aim of understanding the landscape in this domain. We also look at prevalent technical standards in health informatics from major bodies such as the International Standards Organization (ISO) and the U. S. Food and Drug Administration (FDA). Our repository contains codes of ethics from the International Medical Informatics Association (IMIA), the iHealth Coalition (iHC), the American Health Information Management Association (AHIMA), the Australasian College of Health Informatics (ACHI), the British Computer Society (BCS), and the UK Council for Health Informatics Professions (UKCHIP), with room for adding more in the future. Our major contribution is enhancing the findability of codes and standards related to health informatics ethics by compilation and unified access through the health informatics ethics repository. PMID:25422725
Migration of the Gaudi and LHCb software repositories from CVS to Subversion
NASA Astrophysics Data System (ADS)
Clemencic, M.; Degaudenzi, H.; LHCb Collaboration
2011-12-01
A common code repository is of primary importance in a distributed development environment such as large HEP experiments. CVS (Concurrent Versions System) has been used in the past years at CERN for the hosting of shared software repositories, among which were the repositories for the Gaudi Framework and the LHCb software projects. Many developers around the world produced alternative systems to share code and revisions among several developers, mainly to overcome the limitations in CVS, and CERN has recently started a new service for code hosting based on the version control system Subversion. The differences between CVS and Subversion and the way the code was organized in Gaudi and LHCb CVS repositories required careful study and planning of the migration. Special care was used to define the organization of the new Subversion repository. To avoid as much as possible disruption in the development cycle, the migration has been gradual with the help of tools developed explicitly to hide the differences between the two systems. The principles guiding the migration steps, the organization of the Subversion repository and the tools developed will be presented, as well as the problems encountered both from the librarian and the user points of view.
Dugas, Martin; Meidt, Alexandra; Neuhaus, Philipp; Storck, Michael; Varghese, Julian
2016-06-01
The volume and complexity of patient data - especially in personalised medicine - is steadily increasing, both regarding clinical data and genomic profiles: Typically more than 1,000 items (e.g., laboratory values, vital signs, diagnostic tests etc.) are collected per patient in clinical trials. In oncology hundreds of mutations can potentially be detected for each patient by genomic profiling. Therefore data integration from multiple sources constitutes a key challenge for medical research and healthcare. Semantic annotation of data elements can facilitate to identify matching data elements in different sources and thereby supports data integration. Millions of different annotations are required due to the semantic richness of patient data. These annotations should be uniform, i.e., two matching data elements shall contain the same annotations. However, large terminologies like SNOMED CT or UMLS don't provide uniform coding. It is proposed to develop semantic annotations of medical data elements based on a large-scale public metadata repository. To achieve uniform codes, semantic annotations shall be re-used if a matching data element is available in the metadata repository. A web-based tool called ODMedit ( https://odmeditor.uni-muenster.de/ ) was developed to create data models with uniform semantic annotations. It contains ~800,000 terms with semantic annotations which were derived from ~5,800 models from the portal of medical data models (MDM). The tool was successfully applied to manually annotate 22 forms with 292 data items from CDISC and to update 1,495 data models of the MDM portal. Uniform manual semantic annotation of data models is feasible in principle, but requires a large-scale collaborative effort due to the semantic richness of patient data. A web-based tool for these annotations is available, which is linked to a public metadata repository.
Software Attribution for Geoscience Applications in the Computational Infrastructure for Geodynamics
NASA Astrophysics Data System (ADS)
Hwang, L.; Dumit, J.; Fish, A.; Soito, L.; Kellogg, L. H.; Smith, M.
2015-12-01
Scientific software is largely developed by individual scientists and represents a significant intellectual contribution to the field. As the scientific culture and funding agencies move towards an expectation that software be open-source, there is a corresponding need for mechanisms to cite software, both to provide credit and recognition to developers, and to aid in discoverability of software and scientific reproducibility. We assess the geodynamic modeling community's current citation practices by examining more than 300 predominantly self-reported publications utilizing scientific software in the past 5 years that is available through the Computational Infrastructure for Geodynamics (CIG). Preliminary results indicate that authors cite and attribute software either through citing (in rank order) peer-reviewed scientific publications, a user's manual, and/or a paper describing the software code. Attributions maybe found directly in the text, in acknowledgements, in figure captions, or in footnotes. What is considered citable varies widely. Citations predominantly lack software version numbers or persistent identifiers to find the software package. Versioning may be implied through reference to a versioned user manual. Authors sometimes report code features used and whether they have modified the code. As an open-source community, CIG requests that researchers contribute their modifications to the repository. However, such modifications may not be contributed back to a repository code branch, decreasing the chances of discoverability and reproducibility. Survey results through CIG's Software Attribution for Geoscience Applications (SAGA) project suggest that lack of knowledge, tools, and workflows to cite codes are barriers to effectively implement the emerging citation norms. Generated on-demand attributions on software landing pages and a prototype extensible plug-in to automatically generate attributions in codes are the first steps towards reproducibility.
Busby, Ben; Lesko, Matthew; Federer, Lisa
2016-01-01
In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon's conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team.
Finite-Length Line Source Superposition Model (FLLSSM)
NASA Astrophysics Data System (ADS)
1980-03-01
A linearized thermal conduction model was developed to economically determine media temperatures in geologic repositories for nuclear wastes. Individual canisters containing either high level waste or spent fuel assemblies were represented as finite length line sources in a continuous media. The combined effects of multiple canisters in a representative storage pattern were established at selected points of interest by superposition of the temperature rises calculated for each canister. The methodology is outlined and the computer code FLLSSM which performs required numerical integrations and superposition operations is described.
Studying the laws of software evolution in a long-lived FLOSS project.
Gonzalez-Barahona, Jesus M; Robles, Gregorio; Herraiz, Israel; Ortega, Felipe
2014-07-01
Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking all their code changes for periods of more than 15 years. This paper examines in detail the evolution of one of such projects, glibc, with the main aim of understanding how it evolved and how it matched Lehman's laws of software evolution. As a result, we have developed a methodology for studying the evolution of such long-lived projects based on the information in their source code management repository, described in detail several aspects of the history of glibc, including some activity and size metrics, and found how some of the laws of software evolution may not hold in this case. © 2013 The Authors. Journal of Software: Evolution and Process published by John Wiley & Sons Ltd.
Studying the laws of software evolution in a long-lived FLOSS project
Gonzalez-Barahona, Jesus M; Robles, Gregorio; Herraiz, Israel; Ortega, Felipe
2014-01-01
Some free, open-source software projects have been around for quite a long time, the longest living ones dating from the early 1980s. For some of them, detailed information about their evolution is available in source code management systems tracking all their code changes for periods of more than 15 years. This paper examines in detail the evolution of one of such projects, glibc, with the main aim of understanding how it evolved and how it matched Lehman's laws of software evolution. As a result, we have developed a methodology for studying the evolution of such long-lived projects based on the information in their source code management repository, described in detail several aspects of the history of glibc, including some activity and size metrics, and found how some of the laws of software evolution may not hold in this case. © 2013 The Authors. Journal of Software: Evolution and Process published by John Wiley & Sons Ltd. PMID:25893093
Implementation of an OAIS Repository Using Free, Open Source Software
NASA Astrophysics Data System (ADS)
Flathers, E.; Gessler, P. E.; Seamon, E.
2015-12-01
The Northwest Knowledge Network (NKN) is a regional data repository located at the University of Idaho that focuses on the collection, curation, and distribution of research data. To support our home institution and others in the region, we offer services to researchers at all stages of the data lifecycle—from grant application and data management planning to data distribution and archive. In this role, we recognize the need to work closely with other data management efforts at partner institutions and agencies, as well as with larger aggregation efforts such as our state geospatial data clearinghouses, data.gov, DataONE, and others. In the past, one of our challenges with monolithic, prepackaged data management solutions is that customization can be difficult to implement and maintain, especially as new versions of the software are released that are incompatible with our local codebase. Our solution is to break the monolith up into its constituent parts, which offers us several advantages. First, any customizations that we make are likely to fall into areas that can be accessed through Application Program Interfaces (API) that are likely to remain stable over time, so our code stays compatible. Second, as components become obsolete or insufficient to meet new demands that arise, we can replace the individual components with minimal effect on the rest of the infrastructure, causing less disruption to operations. Other advantages include increased system reliability, staggered rollout of new features, enhanced compatibility with legacy systems, reduced dependence on a single software company as a point of failure, and the separation of development into manageable tasks. In this presentation, we describe our application of the Service Oriented Architecture (SOA) design paradigm to assemble a data repository that conforms to the Open Archival Information System (OAIS) Reference Model primarily using a collection of free and open-source software. We detail the design of the repository, based upon open standards to support interoperability with other institutions' systems and with future versions of our own software components. We also describe the implementation process, including our use of GitHub as a collaboration tool and code repository.
NASA Astrophysics Data System (ADS)
Vopálka, D.; Lukin, D.; Vokál, A.
2006-01-01
Three new modules modelling the processes that occur in a deep geological repository have been prepared in the GoldSim computer code environment (using its Transport Module). These modules help to understand the role of selected parameters in the near-field region of the final repository and to prepare an own complex model of the repository behaviour. The source term module includes radioactive decay and ingrowth in the canister, first order degradation of fuel matrix, solubility limitation of the concentration of the studied nuclides, and diffusive migration through the surrounding bentonite layer controlled by the output boundary condition formulated with respect to the rate of water flow in the rock. The corrosion module describes corrosion of canisters made of carbon steel and transport of corrosion products in the near-field region. This module computes balance equations between dissolving species and species transported by diffusion and/or advection from the surface of a solid material. The diffusion module that includes also non-linear form of the interaction isotherm can be used for an evaluation of small-scale diffusion experiments.
Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice.
Li, Feng; Li, Maoyu; Xiao, Zhiqiang; Zhang, Pengfei; Li, Jianling; Chen, Zhuchu
2006-01-11
Many proteomics initiatives require integration of all information with uniformcriteria from collection of samples and data display to publication of experimental results. The integration and exchanging of these data of different formats and structure imposes a great challenge to us. The XML technology presents a promise in handling this task due to its simplicity and flexibility. Nasopharyngeal carcinoma (NPC) is one of the most common cancers in southern China and Southeast Asia, which has marked geographic and racial differences in incidence. Although there are some cancer proteome databases now, there is still no NPC proteome database. The raw NPC proteome experiment data were captured into one XML document with Human Proteome Markup Language (HUP-ML) editor and imported into native XML database Xindice. The 2D/MS repository of NPC proteome was constructed with Apache, PHP and Xindice to provide access to the database via Internet. On our website, two methods, keyword query and click query, were provided at the same time to access the entries of the NPC proteome database. Our 2D/MS repository can be used to share the raw NPC proteomics data that are generated from gel-based proteomics experiments. The database, as well as the PHP source codes for constructing users' own proteome repository, can be accessed at http://www.xyproteomics.org/.
Influence analysis of Github repositories.
Hu, Yan; Zhang, Jun; Bai, Xiaomei; Yu, Shuo; Yang, Zhuo
2016-01-01
With the support of cloud computing techniques, social coding platforms have changed the style of software development. Github is now the most popular social coding platform and project hosting service. Software developers of various levels keep entering Github, and use Github to save their public and private software projects. The large amounts of software developers and software repositories on Github are posing new challenges to the world of software engineering. This paper tries to tackle one of the important problems: analyzing the importance and influence of Github repositories. We proposed a HITS based influence analysis on graphs that represent the star relationship between Github users and repositories. A weighted version of HITS is applied to the overall star graph, and generates a different set of top influential repositories other than the results from standard version of HITS algorithm. We also conduct the influential analysis on per-month star graph, and study the monthly influence ranking of top repositories.
Busby, Ben; Lesko, Matthew; Federer, Lisa
2016-01-01
In genomics, bioinformatics and other areas of data science, gaps exist between extant public datasets and the open-source software tools built by the community to analyze similar data types. The purpose of biological data science hackathons is to assemble groups of genomics or bioinformatics professionals and software developers to rapidly prototype software to address these gaps. The only two rules for the NCBI-assisted hackathons run so far are that 1) data either must be housed in public data repositories or be deposited to such repositories shortly after the hackathon’s conclusion, and 2) all software comprising the final pipeline must be open-source or open-use. Proposed topics, as well as suggested tools and approaches, are distributed to participants at the beginning of each hackathon and refined during the event. Software, scripts, and pipelines are developed and published on GitHub, a web service providing publicly available, free-usage tiers for collaborative software development. The code resulting from each hackathon is published at https://github.com/NCBI-Hackathons/ with separate directories or repositories for each team. PMID:27134733
Adapting a Clinical Data Repository to ICD-10-CM through the use of a Terminology Repository
Cimino, James J.; Remennick, Lyubov
2014-01-01
Clinical data repositories frequently contain patient diagnoses coded with the International Classification of Diseases, Ninth Revision (ICD-9-CM). These repositories now need to accommodate data coded with the Tenth Revision (ICD-10-CM). Database users wish to retrieve relevant data regardless of the system by which they are coded. We demonstrate how a terminology repository (the Research Entities Dictionary or RED) serves as an ontology relating terms of both ICD versions to each other to support seamless version-independent retrieval from the Biomedical Translational Research Information System (BTRIS) at the National Institutes of Health. We make use of the Center for Medicare and Medicaid Services’ General Equivalence Mappings (GEMs) to reduce the modeling effort required to determine whether ICD-10-CM terms should be added to the RED as new concepts or as synonyms of existing concepts. A divide-and-conquer approach is used to develop integration heuristics that offer a satisfactory interim solution and facilitate additional refinement of the integration as time and resources allow. PMID:25954344
Reproducible Research in the Geosciences at Scale: Achievable Goal or Elusive Dream?
NASA Astrophysics Data System (ADS)
Wyborn, L. A.; Evans, B. J. K.
2016-12-01
Reproducibility is a fundamental tenant of the scientific method: it implies that any researcher, or a third party working independently, can duplicate any experiment or investigation and produce the same results. Historically computationally based research involved an individual using their own data and processing it in their own private area, often using software they wrote or inherited from close collaborators. Today, a researcher is likely to be part of a large team that will use a subset of data from an external repository and then process the data on a public or private cloud or on a large centralised supercomputer, using a mixture of their own code, third party software and libraries, or global community codes. In 'Big Geoscience' research it is common for data inputs to be extracts from externally managed dynamic data collections, where new data is being regularly appended, or existing data is revised when errors are detected and/or as processing methods are improved. New workflows increasingly use services to access data dynamically to create subsets on-the-fly from distributed sources, each of which can have a complex history. At major computational facilities, underlying systems, libraries, software and services are being constantly tuned and optimised, or as new or replacement infrastructure being installed. Likewise code used from a community repository is continually being refined, re-packaged and ported to the target platform. To achieve reproducibility, today's researcher increasingly needs to track their workflow, including querying information on the current or historical state of facilities used. Versioning methods are standard practice for software repositories or packages, but it is not common for either data repositories or data services to provide information about their state, or for systems to provide query-able access to changes in the underlying software. While a researcher can achieve transparency and describe steps in their workflow so that others can repeat them and replicate processes undertaken, they cannot achieve exact reproducibility or even transparency of results generated. In Big Geoscience, full reproducibiliy will be an elusive dream until data repositories and compute facilities can provide provenance information in a standards compliant, machine query-able way.
NASA Astrophysics Data System (ADS)
Maeda, Takuto; Takemura, Shunsuke; Furumura, Takashi
2017-07-01
We have developed an open-source software package, Open-source Seismic Wave Propagation Code (OpenSWPC), for parallel numerical simulations of seismic wave propagation in 3D and 2D (P-SV and SH) viscoelastic media based on the finite difference method in local-to-regional scales. This code is equipped with a frequency-independent attenuation model based on the generalized Zener body and an efficient perfectly matched layer for absorbing boundary condition. A hybrid-style programming using OpenMP and the Message Passing Interface (MPI) is adopted for efficient parallel computation. OpenSWPC has wide applicability for seismological studies and great portability to allowing excellent performance from PC clusters to supercomputers. Without modifying the code, users can conduct seismic wave propagation simulations using their own velocity structure models and the necessary source representations by specifying them in an input parameter file. The code has various modes for different types of velocity structure model input and different source representations such as single force, moment tensor and plane-wave incidence, which can easily be selected via the input parameters. Widely used binary data formats, the Network Common Data Form (NetCDF) and the Seismic Analysis Code (SAC) are adopted for the input of the heterogeneous structure model and the outputs of the simulation results, so users can easily handle the input/output datasets. All codes are written in Fortran 2003 and are available with detailed documents in a public repository.[Figure not available: see fulltext.
DOE Office of Scientific and Technical Information (OSTI.GOV)
West, J.M.; Coombs, P.; Gardner, S.J.
1995-12-31
The Maqarin site, Jordan is being studied as a natural analogue of a cementitious radioactive waste repository. The microbiology has been studied and diverse microbial populations capable of tolerating alkaline pH were detected at all sampling localities. Dissolved organic carbon was identified as the potentially most important reductant with sulfate identified as the main oxidant, both supply energy for microbial life. Calculations on upper limits of microbial numbers were made with a microbiology code (MGSE) using existing information but the results are overestimates when compared with field observations. This indicates that the model is very conservative and that more informationmore » on, for example, carbon sources is required.« less
Scharm, Martin; Wolkenhauer, Olaf; Waltemath, Dagmar
2016-02-15
Repositories support the reuse of models and ensure transparency about results in publications linked to those models. With thousands of models available in repositories, such as the BioModels database or the Physiome Model Repository, a framework to track the differences between models and their versions is essential to compare and combine models. Difference detection not only allows users to study the history of models but also helps in the detection of errors and inconsistencies. Existing repositories lack algorithms to track a model's development over time. Focusing on SBML and CellML, we present an algorithm to accurately detect and describe differences between coexisting versions of a model with respect to (i) the models' encoding, (ii) the structure of biological networks and (iii) mathematical expressions. This algorithm is implemented in a comprehensive and open source library called BiVeS. BiVeS helps to identify and characterize changes in computational models and thereby contributes to the documentation of a model's history. Our work facilitates the reuse and extension of existing models and supports collaborative modelling. Finally, it contributes to better reproducibility of modelling results and to the challenge of model provenance. The workflow described in this article is implemented in BiVeS. BiVeS is freely available as source code and binary from sems.uni-rostock.de. The web interface BudHat demonstrates the capabilities of BiVeS at budhat.sems.uni-rostock.de. © The Author 2015. Published by Oxford University Press.
Convection and thermal radiation analytical models applicable to a nuclear waste repository room
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, B.W.
1979-01-17
Time-dependent temperature distributions in a deep geologic nuclear waste repository have a direct impact on the physical integrity of the emplaced canisters and on the design of retrievability options. This report (1) identifies the thermodynamic properties and physical parameters of three convection regimes - forced, natural, and mixed; (2) defines the convection correlations applicable to calculating heat flow in a ventilated (forced-air) and in a nonventilated nuclear waste repository room; and (3) delineates a computer code that (a) computes and compares the floor-to-ceiling heat flow by convection and radiation, and (b) determines the nonlinear equivalent conductivity table for a repositorymore » room. (The tables permit the use of the ADINAT code to model surface-to-surface radiation and the TRUMP code to employ two different emissivity properties when modeling radiation exchange between the surface of two different materials.) The analysis shows that thermal radiation dominates heat flow modes in a nuclear waste repository room.« less
NASA Astrophysics Data System (ADS)
Kwon, N.; Gentle, J.; Pierce, S. A.
2015-12-01
Software code developed for research is often used for a relatively short period of time before it is abandoned, lost, or becomes outdated. This unintentional abandonment of code is a valid problem in the 21st century scientific process, hindering widespread reusability and increasing the effort needed to develop research software. Potentially important assets, these legacy codes may be resurrected and documented digitally for long-term reuse, often with modest effort. Furthermore, the revived code may be openly accessible in a public repository for researchers to reuse or improve. For this study, the research team has begun to revive the codebase for Groundwater Decision Support System (GWDSS), originally developed for participatory decision making to aid urban planning and groundwater management, though it may serve multiple use cases beyond those originally envisioned. GWDSS was designed as a java-based wrapper with loosely federated commercial and open source components. If successfully revitalized, GWDSS will be useful for both practical applications as a teaching tool and case study for groundwater management, as well as informing theoretical research. Using the knowledge-sharing approaches documented by the NSF-funded Ontosoft project, digital documentation of GWDSS is underway, from conception to development, deployment, characterization, integration, composition, and dissemination through open source communities and geosciences modeling frameworks. Information assets, documentation, and examples are shared using open platforms for data sharing and assigned digital object identifiers. Two instances of GWDSS version 3.0 are being created: 1) a virtual machine instance for the original case study to serve as a live demonstration of the decision support tool, assuring the original version is usable, and 2) an open version of the codebase, executable installation files, and developer guide available via an open repository, assuring the source for the application is accessible with version control and potential for new branch developments. Finally, metadata about the software has been completed within the OntoSoft portal to provide descriptive curation, make GWDSS searchable, and complete documentation of the scientific software lifecycle.
A large-scale solar dynamics observatory image dataset for computer vision applications.
Kucuk, Ahmet; Banda, Juan M; Angryk, Rafal A
2017-01-01
The National Aeronautics Space Agency (NASA) Solar Dynamics Observatory (SDO) mission has given us unprecedented insight into the Sun's activity. By capturing approximately 70,000 images a day, this mission has created one of the richest and biggest repositories of solar image data available to mankind. With such massive amounts of information, researchers have been able to produce great advances in detecting solar events. In this resource, we compile SDO solar data into a single repository in order to provide the computer vision community with a standardized and curated large-scale dataset of several hundred thousand solar events found on high resolution solar images. This publicly available resource, along with the generation source code, will accelerate computer vision research on NASA's solar image data by reducing the amount of time spent performing data acquisition and curation from the multiple sources we have compiled. By improving the quality of the data with thorough curation, we anticipate a wider adoption and interest from the computer vision to the solar physics community.
Continuous integration and quality control for scientific software
NASA Astrophysics Data System (ADS)
Neidhardt, A.; Ettl, M.; Brisken, W.; Dassing, R.
2013-08-01
Modern software has to be stable, portable, fast and reliable. This is going to be also more and more important for scientific software. But this requires a sophisticated way to inspect, check and evaluate the quality of source code with a suitable, automated infrastructure. A centralized server with a software repository and a version control system is one essential part, to manage the code basis and to control the different development versions. While each project can be compiled separately, the whole code basis can also be compiled with one central “Makefile”. This is used to create automated, nightly builds. Additionally all sources are inspected automatically with static code analysis and inspection tools, which check well-none error situations, memory and resource leaks, performance issues, or style issues. In combination with an automatic documentation generator it is possible to create the developer documentation directly from the code and the inline comments. All reports and generated information are presented as HTML page on a Web server. Because this environment increased the stability and quality of the software of the Geodetic Observatory Wettzell tremendously, it is now also available for scientific communities. One regular customer is already the developer group of the DiFX software correlator project.
Sharing environmental models: An Approach using GitHub repositories and Web Processing Services
NASA Astrophysics Data System (ADS)
Stasch, Christoph; Nuest, Daniel; Pross, Benjamin
2016-04-01
The GLUES (Global Assessment of Land Use Dynamics, Greenhouse Gas Emissions and Ecosystem Services) project established a spatial data infrastructure for scientific geospatial data and metadata (http://geoportal-glues.ufz.de), where different regional collaborative projects researching the impacts of climate and socio-economic changes on sustainable land management can share their underlying base scenarios and datasets. One goal of the project is to ease the sharing of computational models between institutions and to make them easily executable in Web-based infrastructures. In this work, we present such an approach for sharing computational models relying on GitHub repositories (http://github.com) and Web Processing Services. At first, model providers upload their model implementations to GitHub repositories in order to share them with others. The GitHub platform allows users to submit changes to the model code. The changes can be discussed and reviewed before merging them. However, while GitHub allows sharing and collaborating of model source code, it does not actually allow running these models, which requires efforts to transfer the implementation to a model execution framework. We thus have extended an existing implementation of the OGC Web Processing Service standard (http://www.opengeospatial.org/standards/wps), the 52°North Web Processing Service (http://52north.org/wps) platform to retrieve all model implementations from a git (http://git-scm.com) repository and add them to the collection of published geoprocesses. The current implementation is restricted to models implemented as R scripts using WPS4R annotations (Hinz et al.) and to Java algorithms using the 52°North WPS Java API. The models hence become executable through a standardized Web API by multiple clients such as desktop or browser GIS and modelling frameworks. If the model code is changed on the GitHub platform, the changes are retrieved by the service and the processes will be updated accordingly. The admin tool of the 52°North WPS was extended to support automated retrieval and deployment of computational models from GitHub repositories. Once the R code is available in the GitHub repo, the contained process can be easily deployed and executed by simply defining the GitHub repository URL in the WPS admin tool. We illustrate the usage of the approach by sharing and running a model for land use system archetypes developed by the Helmholtz Centre for Environmental Research (UFZ, see Vaclavik et al.). The original R code was extended and published in the 52°North WPS using both, public and non-public datasets (Nüst et al., see also https://github.com/52North/glues-wps). Hosting the analysis in a Git repository now allows WPS administrators, client developers, and modelers to easily work together on new versions or completely new web processes using the powerful GitHub collaboration platform. References: Hinz, M. et. al. (2013): Spatial Statistics on the Geospatial Web. In: The 16th AGILE International Conference on Geographic Information Science, Short Papers. http://www.agile-online.org/Conference_Paper/CDs/agile_2013/Short_Papers/SP_S3.1_Hinz.pdf Nüst, D. et. al.: (2015): Open and reproducible global land use classification. In: EGU General Assembly Conference Abstracts . Vol. 17. European Geophysical Union, 2015, p. 9125, http://meetingorganizer.copernicus. org/EGU2015/EGU2015- 9125.pdf Vaclavik, T., et. al. (2013): Mapping global land system archetypes. Global Environmental Change 23(6): 1637-1647. Online available: October 9, 2013, DOI: 10.1016/j.gloenvcha.2013.09.004
Three-dimensional thermal analysis of a high-level waste repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Altenbach, T.J.
1979-04-01
The analysis used the TRUMP computer code to evaluate the thermal fields for six repository scenarios that studied the effects of room ventilation, room backfill, and repository thermal diffusivity. The results for selected nodes are presented as plots showing the effect of temperature as a function of time. 15 figures, 6 tables.
Clinical results of HIS, RIS, PACS integration using data integration CASE tools
NASA Astrophysics Data System (ADS)
Taira, Ricky K.; Chan, Hing-Ming; Breant, Claudine M.; Huang, Lu J.; Valentino, Daniel J.
1995-05-01
Current infrastructure research in PACS is dominated by the development of communication networks (local area networks, teleradiology, ATM networks, etc.), multimedia display workstations, and hierarchical image storage architectures. However, limited work has been performed on developing flexible, expansible, and intelligent information processing architectures for the vast decentralized image and text data repositories prevalent in healthcare environments. Patient information is often distributed among multiple data management systems. Current large-scale efforts to integrate medical information and knowledge sources have been costly with limited retrieval functionality. Software integration strategies to unify distributed data and knowledge sources is still lacking commercially. Systems heterogeneity (i.e., differences in hardware platforms, communication protocols, database management software, nomenclature, etc.) is at the heart of the problem and is unlikely to be standardized in the near future. In this paper, we demonstrate the use of newly available CASE (computer- aided software engineering) tools to rapidly integrate HIS, RIS, and PACS information systems. The advantages of these tools include fast development time (low-level code is generated from graphical specifications), and easy system maintenance (excellent documentation, easy to perform changes, and centralized code repository in an object-oriented database). The CASE tools are used to develop and manage the `middle-ware' in our client- mediator-serve architecture for systems integration. Our architecture is scalable and can accommodate heterogeneous database and communication protocols.
NASA Astrophysics Data System (ADS)
Massmann, J.; Nagel, T.; Bilke, L.; Böttcher, N.; Heusermann, S.; Fischer, T.; Kumar, V.; Schäfers, A.; Shao, H.; Vogel, P.; Wang, W.; Watanabe, N.; Ziefle, G.; Kolditz, O.
2016-12-01
As part of the German site selection process for a high-level nuclear waste repository, different repository concepts in the geological candidate formations rock salt, clay stone and crystalline rock are being discussed. An open assessment of these concepts using numerical simulations requires physical models capturing the individual particularities of each rock type and associated geotechnical barrier concept to a comparable level of sophistication. In a joint work group of the Helmholtz Centre for Environmental Research (UFZ) and the German Federal Institute for Geosciences and Natural Resources (BGR), scientists of the UFZ are developing and implementing multiphysical process models while BGR scientists apply them to large scale analyses. The advances in simulation methods for waste repositories are incorporated into the open-source code OpenGeoSys. Here, recent application-driven progress in this context is highlighted. A robust implementation of visco-plasticity with temperature-dependent properties into a framework for the thermo-mechanical analysis of rock salt will be shown. The model enables the simulation of heat transport along with its consequences on the elastic response as well as on primary and secondary creep or the occurrence of dilatancy in the repository near field. Transverse isotropy, non-isothermal hydraulic processes and their coupling to mechanical stresses are taken into account for the analysis of repositories in clay stone. These processes are also considered in the near field analyses of engineered barrier systems, including the swelling/shrinkage of the bentonite material. The temperature-dependent saturation evolution around the heat-emitting waste container is described by different multiphase flow formulations. For all mentioned applications, we illustrate the workflow from model development and implementation, over verification and validation, to repository-scale application simulations using methods of high performance computing.
An infrastructure for ontology-based information systems in biomedicine: RICORDO case study.
Wimalaratne, Sarala M; Grenon, Pierre; Hoehndorf, Robert; Gkoutos, Georgios V; de Bono, Bernard
2012-02-01
The article presents an infrastructure for supporting the semantic interoperability of biomedical resources based on the management (storing and inference-based querying) of their ontology-based annotations. This infrastructure consists of: (i) a repository to store and query ontology-based annotations; (ii) a knowledge base server with an inference engine to support the storage of and reasoning over ontologies used in the annotation of resources; (iii) a set of applications and services allowing interaction with the integrated repository and knowledge base. The infrastructure is being prototyped and developed and evaluated by the RICORDO project in support of the knowledge management of biomedical resources, including physiology and pharmacology models and associated clinical data. The RICORDO toolkit and its source code are freely available from http://ricordo.eu/relevant-resources. sarala@ebi.ac.uk.
NA-42 TI Shared Software Component Library FY2011 Final Report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Knudson, Christa K.; Rutz, Frederick C.; Dorow, Kevin E.
The NA-42 TI program initiated an effort in FY2010 to standardize its software development efforts with the long term goal of migrating toward a software management approach that will allow for the sharing and reuse of code developed within the TI program, improve integration, ensure a level of software documentation, and reduce development costs. The Pacific Northwest National Laboratory (PNNL) has been tasked with two activities that support this mission. PNNL has been tasked with the identification, selection, and implementation of a Shared Software Component Library. The intent of the library is to provide a common repository that is accessiblemore » by all authorized NA-42 software development teams. The repository facilitates software reuse through a searchable and easy to use web based interface. As software is submitted to the repository, the component registration process captures meta-data and provides version control for compiled libraries, documentation, and source code. This meta-data is then available for retrieval and review as part of library search results. In FY2010, PNNL and staff from the Remote Sensing Laboratory (RSL) teamed up to develop a software application with the goal of replacing the aging Aerial Measuring System (AMS). The application under development includes an Advanced Visualization and Integration of Data (AVID) framework and associated AMS modules. Throughout development, PNNL and RSL have utilized a common AMS code repository for collaborative code development. The AMS repository is hosted by PNNL, is restricted to the project development team, is accessed via two different geographic locations and continues to be used. The knowledge gained from the collaboration and hosting of this repository in conjunction with PNNL software development and systems engineering capabilities were used in the selection of a package to be used in the implementation of the software component library on behalf of NA-42 TI. The second task managed by PNNL is the development and continued maintenance of the NA-42 TI Software Development Questionnaire. This questionnaire is intended to help software development teams working under NA-42 TI in documenting their development activities. When sufficiently completed, the questionnaire illustrates that the software development activities recorded incorporate significant aspects of the software engineering lifecycle. The questionnaire template is updated as comments are received from NA-42 and/or its development teams and revised versions distributed to those using the questionnaire. PNNL also maintains a list of questionnaire recipients. The blank questionnaire template, the AVID and AMS software being developed, and the completed AVID AMS specific questionnaire are being used as the initial content to be established in the TI Component Library. This report summarizes the approach taken to identify requirements, search for and evaluate technologies, and the approach taken for installation of the software needed to host the component library. Additionally, it defines the process by which users request access for the contribution and retrieval of library content.« less
ImgLib2--generic image processing in Java.
Pietzsch, Tobias; Preibisch, Stephan; Tomancák, Pavel; Saalfeld, Stephan
2012-11-15
ImgLib2 is an open-source Java library for n-dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib. Supplementary data are available at Bioinformatics Online. saalfeld@mpi-cbg.de
Factor information retrieval system version 2. 0 (fire) (for microcomputers). Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
FIRE Version 2.0 contains EPA's unique recommended criteria and toxic air emission estimation factors. FIRE consists of: (1) an EPA internal repository system that contains emission factor data identified and collected, and (2) an external distribution system that contains only EPA's recommended factors. The emission factors, compiled from a review of the literature, are identified by pollutant name, CAS number, process and emission source descriptions, SIC code, SCC, and control status. The factors are rated for quality using AP-42 rating criteria.
Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R.; Ellis, Heidi JC; Hinman, M. Lee; Vyas, Jay; Gryk, Michael R.
2012-01-01
Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org). PMID:25328913
Fenwick, Matthew; Sesanker, Colbert; Schiller, Martin R; Ellis, Heidi Jc; Hinman, M Lee; Vyas, Jay; Gryk, Michael R
2012-01-01
Scientists are continually faced with the need to express complex mathematical notions in code. The renaissance of functional languages such as LISP and Haskell is often credited to their ability to implement complex data operations and mathematical constructs in an expressive and natural idiom. The slow adoption of functional computing in the scientific community does not, however, reflect the congeniality of these fields. Unfortunately, the learning curve for adoption of functional programming techniques is steeper than that for more traditional languages in the scientific community, such as Python and Java, and this is partially due to the relative sparseness of available learning resources. To fill this gap, we demonstrate and provide applied, scientifically substantial examples of functional programming, We present a multi-language source-code repository for software integration and algorithm development, which generally focuses on the fields of machine learning, data processing, bioinformatics. We encourage scientists who are interested in learning the basics of functional programming to adopt, reuse, and learn from these examples. The source code is available at: https://github.com/CONNJUR/CONNJUR-Sandbox (see also http://www.connjur.org).
Nature Research journals reproducibility policies and initiatives in the Earth sciences
NASA Astrophysics Data System (ADS)
VanDecar, J. C.
2016-12-01
The Nature Research journals strongly support the long-term endeavour by funders, institutions, researchers and publishers toward increasing the reliability and reproducibility of published research. In the Earth, space and environmental sciences this mainly takes the form of ensuring that underlying data and methods in each manuscript are made as transparent and accessible as possible. Supporting data must be made available to editors and peer reviewers at the time of submission for the purposes of evaluating each manuscript. But the preferred way to share data sets is via public repositories. When appropriate community repositories are available, we strongly encourage authors to deposit their data prior to publication. We also now require that a statement be included in each manuscript, under the heading "Data availability", indicating whether and how the data can be accessed, including any restrictions to access. To allow authors to describe their experimental design and methods in as much detail as necessary, the Nature Research journals have effectively abolished space restrictions on online methods sections. To further increase transparency, we also encourage authors to provide tables of the data behind graphs and figures as Source Data. This builds on our established data-deposition policy for specific experiments and large data sets. The Source Data is made available directly from the figure legend, for easy access. We also require that details of geological samples and palaeontological specimens include clear provenance information to ensure full transparency of the research methods. Palaeontological and type specimens must be deposited in a recognised museum or collection to permit free access by other researchers in perpetuity. Finally, authors must make available upon request, to editors and reviewers, any previously unreported custom computer code used to generate results that are reported in the paper and central to its main claims. For all studies using custom code that is deemed central to the conclusions, a statement must be included, under the heading "Code availability", indicating whether and how the code can be accessed, including any restrictions to access.
Unwin, Ian; Jansen-van der Vliet, Martine; Westenbrink, Susanne; Presser, Karl; Infanger, Esther; Porubska, Janka; Roe, Mark; Finglas, Paul
2016-02-15
The EuroFIR Document and Data Repositories are being developed as accessible collections of source documents, including grey literature, and the food composition data reported in them. These Repositories will contain source information available to food composition database compilers when selecting their nutritional data. The Document Repository was implemented as searchable bibliographic records in the Europe PubMed Central database, which links to the documents online. The Data Repository will contain original data from source documents in the Document Repository. Testing confirmed the FoodCASE food database management system as a suitable tool for the input, documentation and quality assessment of Data Repository information. Data management requirements for the input and documentation of reported analytical results were established, including record identification and method documentation specifications. Document access and data preparation using the Repositories will provide information resources for compilers, eliminating duplicated work and supporting unambiguous referencing of data contributing to their compiled data. Copyright © 2014 Elsevier Ltd. All rights reserved.
Numerical Modeling of Thermal-Hydrology in the Near Field of a Generic High-Level Waste Repository
NASA Astrophysics Data System (ADS)
Matteo, E. N.; Hadgu, T.; Park, H.
2016-12-01
Disposal in a deep geologic repository is one of the preferred option for long term isolation of high-level nuclear waste. Coupled thermal-hydrologic processes induced by decay heat from the radioactive waste may impact fluid flow and the associated migration of radionuclides. This study looked at the effects of those processes in simulations of thermal-hydrology for the emplacement of U. S. Department of Energy managed high-level waste and spent nuclear fuel. Most of the high-level waste sources have lower thermal output which would reduce the impact of thermal propagation. In order to quantify the thermal limits this study concentrated on the higher thermal output sources and on spent nuclear fuel. The study assumed a generic nuclear waste repository at 500 m depth. For the modeling a representative domain was selected representing a portion of the repository layout in order to conduct a detailed thermal analysis. A highly refined unstructured mesh was utilized with refinements near heat sources and at intersections of different materials. Simulations looked at different values for properties of components of the engineered barrier system (i.e. buffer, disturbed rock zone and the host rock). The simulations also looked at the effects of different durations of surface aging of the waste to reduce thermal perturbations. The PFLOTRAN code (Hammond et al., 2014) was used for the simulations. Modeling results for the different options are reported and include temperature and fluid flow profiles in the near field at different simulation times. References:G. E. Hammond, P.C. Lichtner and R.T. Mills, "Evaluating the Performance of Parallel Subsurface Simulators: An Illustrative Example with PFLOTRAN", Water Resources Research, 50, doi:10.1002/2012WR013483 (2014). Sandia National Laboratories is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2016-7510 A
mdFoam+: Advanced molecular dynamics in OpenFOAM
NASA Astrophysics Data System (ADS)
Longshaw, S. M.; Borg, M. K.; Ramisetti, S. B.; Zhang, J.; Lockerby, D. A.; Emerson, D. R.; Reese, J. M.
2018-03-01
This paper introduces mdFoam+, which is an MPI parallelised molecular dynamics (MD) solver implemented entirely within the OpenFOAM software framework. It is open-source and released under the same GNU General Public License (GPL) as OpenFOAM. The source code is released as a publicly open software repository that includes detailed documentation and tutorial cases. Since mdFoam+ is designed entirely within the OpenFOAM C++ object-oriented framework, it inherits a number of key features. The code is designed for extensibility and flexibility, so it is aimed first and foremost as an MD research tool, in which new models and test cases can be developed and tested rapidly. Implementing mdFoam+ in OpenFOAM also enables easier development of hybrid methods that couple MD with continuum-based solvers. Setting up MD cases follows the standard OpenFOAM format, as mdFoam+ also relies upon the OpenFOAM dictionary-based directory structure. This ensures that useful pre- and post-processing capabilities provided by OpenFOAM remain available even though the fully Lagrangian nature of an MD simulation is not typical of most OpenFOAM applications. Results show that mdFoam+ compares well to another well-known MD code (e.g. LAMMPS) in terms of benchmark problems, although it also has additional functionality that does not exist in other open-source MD codes.
76 FR 53454 - Privacy Act System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-08-26
... statutory responsibilities of the OIG; and Acting as a repository and source for information necessary to... in matters relating to the statutory responsibilities of the OIG; and 7. Acting as a repository and.... Acting as a repository and source for information necessary to fulfill the reporting requirements of the...
PRay - A graphical user interface for interactive visualization and modification of rayinvr models
NASA Astrophysics Data System (ADS)
Fromm, T.
2016-01-01
PRay is a graphical user interface for interactive displaying and editing of velocity models for seismic refraction. It is optimized for editing rayinvr models but can also be used as a dynamic viewer for ray tracing results from other software. The main features are the graphical editing of nodes and fast adjusting of the display (stations and phases). It can be extended by user-defined shell scripts and links to phase picking software. PRay is open source software written in the scripting language Perl, runs on Unix-like operating systems including Mac OS X and provides a version controlled source code repository for community development (https://sourceforge.net/projects/pray-plot-rayinvr/).
Damage-plasticity model of the host rock in a nuclear waste repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koudelka, Tomáš; Kruis, Jaroslav, E-mail: kruis@fsv.cvut.cz
The paper describes damage-plasticity model for the modelling of the host rock environment of a nuclear waste repository. Radioactive Waste Repository Authority in Czech Republic assumes the repository to be in a granite rock mass which exhibit anisotropic behaviour where the strength in tension is lower than in compression. In order to describe this phenomenon, the damage-plasticity model is formulated with the help of the Drucker-Prager yield criterion which can be set to capture the compression behaviour while the tensile stress states is described with the help of scalar isotropic damage model. The concept of damage-plasticity model was implemented inmore » the SIFEL finite element code and consequently, the code was used for the simulation of the Äspö Pillar Stability Experiment (APSE) which was performed in order to determine yielding strength under various conditions in similar granite rocks as in Czech Republic. The results from the performed analysis are presented and discussed in the paper.« less
BigBWA: approaching the Burrows-Wheeler aligner to Big Data technologies.
Abuín, José M; Pichel, Juan C; Pena, Tomás F; Amigo, Jorge
2015-12-15
BigBWA is a new tool that uses the Big Data technology Hadoop to boost the performance of the Burrows-Wheeler aligner (BWA). Important reductions in the execution times were observed when using this tool. In addition, BigBWA is fault tolerant and it does not require any modification of the original BWA source code. BigBWA is available at the project GitHub repository: https://github.com/citiususc/BigBWA. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
LittleQuickWarp: an ultrafast image warping tool.
Qu, Lei; Peng, Hanchuan
2015-02-01
Warping images into a standard coordinate space is critical for many image computing related tasks. However, for multi-dimensional and high-resolution images, an accurate warping operation itself is often very expensive in terms of computer memory and computational time. For high-throughput image analysis studies such as brain mapping projects, it is desirable to have high performance image warping tools that are compatible with common image analysis pipelines. In this article, we present LittleQuickWarp, a swift and memory efficient tool that boosts 3D image warping performance dramatically and at the same time has high warping quality similar to the widely used thin plate spline (TPS) warping. Compared to the TPS, LittleQuickWarp can improve the warping speed 2-5 times and reduce the memory consumption 6-20 times. We have implemented LittleQuickWarp as an Open Source plug-in program on top of the Vaa3D system (http://vaa3d.org). The source code and a brief tutorial can be found in the Vaa3D plugin source code repository. Copyright © 2014 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Fraser, Ryan; Gross, Lutz; Wyborn, Lesley; Evans, Ben; Klump, Jens
2015-04-01
Recent investments in HPC, cloud and Petascale data stores, have dramatically increased the scale and resolution that earth science challenges can now be tackled. These new infrastructures are highly parallelised and to fully utilise them and access the large volumes of earth science data now available, a new approach to software stack engineering needs to be developed. The size, complexity and cost of the new infrastructures mean any software deployed has to be reliable, trusted and reusable. Increasingly software is available via open source repositories, but these usually only enable code to be discovered and downloaded. As a user it is hard for a scientist to judge the suitability and quality of individual codes: rarely is there information on how and where codes can be run, what the critical dependencies are, and in particular, on the version requirements and licensing of the underlying software stack. A trusted software framework is proposed to enable reliable software to be discovered, accessed and then deployed on multiple hardware environments. More specifically, this framework will enable those who generate the software, and those who fund the development of software, to gain credit for the effort, IP, time and dollars spent, and facilitate quantification of the impact of individual codes. For scientific users, the framework delivers reviewed and benchmarked scientific software with mechanisms to reproduce results. The trusted framework will have five separate, but connected components: Register, Review, Reference, Run, and Repeat. 1) The Register component will facilitate discovery of relevant software from multiple open source code repositories. The registration process of the code should include information about licensing, hardware environments it can be run on, define appropriate validation (testing) procedures and list the critical dependencies. 2) The Review component is targeting on the verification of the software typically against a set of benchmark cases. This will be achieved by linking the code in the software framework to peer review forums such as Mozilla Science or appropriate Journals (e.g. Geoscientific Model Development Journal) to assist users to know which codes to trust. 3) Referencing will be accomplished by linking the Software Framework to groups such as Figshare or ImpactStory that help disseminate and measure the impact of scientific research, including program code. 4) The Run component will draw on information supplied in the registration process, benchmark cases described in the review and relevant information to instantiate the scientific code on the selected environment. 5) The Repeat component will tap into existing Provenance Workflow engines that will automatically capture information that relate to a particular run of that software, including identification of all input and output artefacts, and all elements and transactions within that workflow. The proposed trusted software framework will enable users to rapidly discover and access reliable code, reduce the time to deploy it and greatly facilitate sharing, reuse and reinstallation of code. Properly designed it could enable an ability to scale out to massively parallel systems and be accessed nationally/ internationally for multiple use cases, including Supercomputer centres, cloud facilities, and local computers.
Software Writing Skills for Your Research - Lessons Learned from Workshops in the Geosciences
NASA Astrophysics Data System (ADS)
Hammitzsch, Martin
2016-04-01
Findings presented in scientific papers are based on data and software. Once in a while they come along with data - but not commonly with software. However, the software used to gain findings plays a crucial role in the scientific work. Nevertheless, software is rarely seen publishable. Thus researchers may not reproduce the findings without the software which is in conflict with the principle of reproducibility in sciences. For both, the writing of publishable software and the reproducibility issue, the quality of software is of utmost importance. For many programming scientists the treatment of source code, e.g. with code design, version control, documentation, and testing is associated with additional work that is not covered in the primary research task. This includes the adoption of processes following the software development life cycle. However, the adoption of software engineering rules and best practices has to be recognized and accepted as part of the scientific performance. Most scientists have little incentive to improve code and do not publish code because software engineering habits are rarely practised by researchers or students. Software engineering skills are not passed on to followers as for paper writing skill. Thus it is often felt that the software or code produced is not publishable. The quality of software and its source code has a decisive influence on the quality of research results obtained and their traceability. So establishing best practices from software engineering to serve scientific needs is crucial for the success of scientific software. Even though scientists use existing software and code, i.e., from open source software repositories, only few contribute their code back into the repositories. So writing and opening code for Open Science means that subsequent users are able to run the code, e.g. by the provision of sufficient documentation, sample data sets, tests and comments which in turn can be proven by adequate and qualified reviews. This assumes that scientist learn to write and release code and software as they learn to write and publish papers. Having this in mind, software could be valued and assessed as a contribution to science. But this requires the relevant skills that can be passed to colleagues and followers. Therefore, the GFZ German Research Centre for Geosciences performed three workshops in 2015 to address the passing of software writing skills to young scientists, the next generation of researchers in the Earth, planetary and space sciences. Experiences in running these workshops and the lessons learned will be summarized in this presentation. The workshops have received support and funding by Software Carpentry, a volunteer organization whose goal is to make scientists more productive, and their work more reliable, by teaching them basic computing skills, and by FOSTER (Facilitate Open Science Training for European Research), a two-year, EU-Funded (FP7) project, whose goal to produce a European-wide training programme that will help to incorporate Open Access approaches into existing research methodologies and to integrate Open Science principles and practice in the current research workflow by targeting the young researchers and other stakeholders.
CDinFusion – Submission-Ready, On-Line Integration of Sequence and Contextual Data
Hankeln, Wolfgang; Wendel, Norma Johanna; Gerken, Jan; Waldmann, Jost; Buttigieg, Pier Luigi; Kostadinov, Ivaylo; Kottmann, Renzo; Yilmaz, Pelin; Glöckner, Frank Oliver
2011-01-01
State of the art (DNA) sequencing methods applied in “Omics” studies grant insight into the ‘blueprints’ of organisms from all domains of life. Sequencing is carried out around the globe and the data is submitted to the public repositories of the International Nucleotide Sequence Database Collaboration. However, the context in which these studies are conducted often gets lost, because experimental data, as well as information about the environment are rarely submitted along with the sequence data. If these contextual or metadata are missing, key opportunities of comparison and analysis across studies and habitats are hampered or even impossible. To address this problem, the Genomic Standards Consortium (GSC) promotes checklists and standards to better describe our sequence data collection and to promote the capturing, exchange and integration of sequence data with contextual data. In a recent community effort the GSC has developed a series of recommendations for contextual data that should be submitted along with sequence data. To support the scientific community to significantly enhance the quality and quantity of contextual data in the public sequence data repositories, specialized software tools are needed. In this work we present CDinFusion, a web-based tool to integrate contextual and sequence data in (Multi)FASTA format prior to submission. The tool is open source and available under the Lesser GNU Public License 3. A public installation is hosted and maintained at the Max Planck Institute for Marine Microbiology at http://www.megx.net/cdinfusion. The tool may also be installed locally using the open source code available at http://code.google.com/p/cdinfusion. PMID:21935468
The Open Spectral Database: an open platform for sharing and searching spectral data.
Chalk, Stuart J
2016-01-01
A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.
The SAMI Galaxy Survey: A prototype data archive for Big Science exploration
NASA Astrophysics Data System (ADS)
Konstantopoulos, I. S.; Green, A. W.; Foster, C.; Scott, N.; Allen, J. T.; Fogarty, L. M. R.; Lorente, N. P. F.; Sweet, S. M.; Hopkins, A. M.; Bland-Hawthorn, J.; Bryant, J. J.; Croom, S. M.; Goodwin, M.; Lawrence, J. S.; Owers, M. S.; Richards, S. N.
2015-11-01
We describe the data archive and database for the SAMI Galaxy Survey, an ongoing observational program that will cover ≈3400 galaxies with integral-field (spatially-resolved) spectroscopy. Amounting to some three million spectra, this is the largest sample of its kind to date. The data archive and built-in query engine use the versatile Hierarchical Data Format (HDF5), which precludes the need for external metadata tables and hence the setup and maintenance overhead those carry. The code produces simple outputs that can easily be translated to plots and tables, and the combination of these tools makes for a light system that can handle heavy data. This article acts as a contextual companion to the SAMI Survey Database source code repository, samiDB, which is freely available online and written entirely in Python. We also discuss the decisions related to the selection of tools and the creation of data visualisation modules. It is our aim that the work presented in this article-descriptions, rationale, and source code-will be of use to scientists looking to set up a maintenance-light data archive for a Big Science data load.
Gpufit: An open-source toolkit for GPU-accelerated curve fitting.
Przybylski, Adrian; Thiel, Björn; Keller-Findeisen, Jan; Stock, Bernd; Bates, Mark
2017-11-16
We present a general purpose, open-source software library for estimation of non-linear parameters by the Levenberg-Marquardt algorithm. The software, Gpufit, runs on a Graphics Processing Unit (GPU) and executes computations in parallel, resulting in a significant gain in performance. We measured a speed increase of up to 42 times when comparing Gpufit with an identical CPU-based algorithm, with no loss of precision or accuracy. Gpufit is designed such that it is easily incorporated into existing applications or adapted for new ones. Multiple software interfaces, including to C, Python, and Matlab, ensure that Gpufit is accessible from most programming environments. The full source code is published as an open source software repository, making its function transparent to the user and facilitating future improvements and extensions. As a demonstration, we used Gpufit to accelerate an existing scientific image analysis package, yielding significantly improved processing times for super-resolution fluorescence microscopy datasets.
NCIP has migrated 132 repositories from the NCI subversion repository to our public NCIP GitHub channel with the goal of facilitating third party contributions to the existing code base. Within the GitHub environment, we are advocating use of the GitHub “fork and pull” model.
Goloborodko, Anton A; Levitsky, Lev I; Ivanov, Mark V; Gorshkov, Mikhail V
2013-02-01
Pyteomics is a cross-platform, open-source Python library providing a rich set of tools for MS-based proteomics. It provides modules for reading LC-MS/MS data, search engine output, protein sequence databases, theoretical prediction of retention times, electrochemical properties of polypeptides, mass and m/z calculations, and sequence parsing. Pyteomics is available under Apache license; release versions are available at the Python Package Index http://pypi.python.org/pyteomics, the source code repository at http://hg.theorchromo.ru/pyteomics, documentation at http://packages.python.org/pyteomics. Pyteomics.biolccc documentation is available at http://packages.python.org/pyteomics.biolccc/. Questions on installation and usage can be addressed to pyteomics mailing list: pyteomics@googlegroups.com.
MetExploreViz: web component for interactive metabolic network visualization.
Chazalviel, Maxime; Frainay, Clément; Poupin, Nathalie; Vinson, Florence; Merlet, Benjamin; Gloaguen, Yoann; Cottret, Ludovic; Jourdan, Fabien
2017-09-15
MetExploreViz is an open source web component that can be easily embedded in any web site. It provides features dedicated to the visualization of metabolic networks and pathways and thus offers a flexible solution to analyze omics data in a biochemical context. Documentation and link to GIT code repository (GPL 3.0 license)are available at this URL: http://metexplore.toulouse.inra.fr/metexploreViz/doc /. Tutorial is available at this URL. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
NELS 2.0 - A general system for enterprise wide information management
NASA Technical Reports Server (NTRS)
Smith, Stephanie L.
1993-01-01
NELS, the NASA Electronic Library System, is an information management tool for creating distributed repositories of documents, drawings, and code for use and reuse by the aerospace community. The NELS retrieval engine can load metadata and source files of full text objects, perform natural language queries to retrieve ranked objects, and create links to connect user interfaces. For flexibility, the NELS architecture has layered interfaces between the application program and the stored library information. The session manager provides the interface functions for development of NELS applications. The data manager is an interface between session manager and the structured data system. The center of the structured data system is the Wide Area Information Server. This system architecture provides access to information across heterogeneous platforms in a distributed environment. There are presently three user interfaces that connect to the NELS engine; an X-Windows interface, and ASCII interface and the Spatial Data Management System. This paper describes the design and operation of NELS as an information management tool and repository.
JGrass-NewAge hydrological system: an open-source platform for the replicability of science.
NASA Astrophysics Data System (ADS)
Bancheri, Marialaura; Serafin, Francesco; Formetta, Giuseppe; Rigon, Riccardo; David, Olaf
2017-04-01
JGrass-NewAge is an open source semi-distributed hydrological modelling system. It is based on the object modelling framework (OMS version 3), on the JGrasstools and on the Geotools. OMS3 allows to create independent packages of software which can be connected at run-time in a working modelling solution. These components are available as library/dependency or as repository to fork in order to add further features. Different tools are adopted to make easier the integration, the interoperability and the use of each package. Most of the components are Gradle integrated, since it represents the state-of-art of the building systems, especially for Java projects. The continuous integration is a further layer between local source code (client-side) and remote repository (server-side) and ensures the building and the testing of the source code at each commit. Finally, the use of Zenodo makes the code hosted in GitHub unique, citable and traceable, with a defined DOI. Following the previous standards, each part of the hydrological cycle is implemented in JGrass-NewAge as a component that can be selected, adopted, and connected to obtain a user "customized" hydrological model. A variety of modelling solutions are possible, allowing a complete hydrological analysis. Moreover, thanks to the JGrasstools and the Geotools, the visualization of the data and of the results using a selected GIS is possible. After the geomorphological analysis of the watershed, the spatial interpolation of the meteorological inputs can be performed using both deterministic (IDW) and geostatistic (Kriging) algorithms. For the radiation balance, the shortwave and longwave radiation can be estimated, which are, in turn, inputs for the simulation of the evapotranspiration, according to Priestly-Taylor and Penman-Monteith formulas. Three degree-day models are implemented for the snow melting and SWE. The runoff production can be simulated using two different components, "Adige" and "Embedded Reservoirs". The travel time theory has recently been integrated for a coupled analysis of the solute transport. Eventually, each component can be connected to the different calibration tools such as LUCA and PSO. Further information about the actual implementation can be found at (https://github.com/geoframecomponents), while the OMS projects with the examples, data and results are available at (https://github.com/GEOframeOMSProjects).
Using the Euclid RTP11.13 Repository in the SEC Environment
2006-03-01
of wrong user, passwd combination. We found out that the user and password are hard coded in the FCT software. It uses defaultEditor@ rtp I I 13.INETI...The FCT will start, but when connecting to the Repository it fails because of wrong user, passwd combination: It uses defaultEditor@rtpl I 13.INETI
LHCb migration from Subversion to Git
NASA Astrophysics Data System (ADS)
Clemencic, M.; Couturier, B.; Closier, J.; Cattaneo, M.
2017-10-01
Due to user demand and to support new development workflows based on code review and multiple development streams, LHCb decided to port the source code management from Subversion to Git, using the CERN GitLab hosting service. Although tools exist for this kind of migration, LHCb specificities and development models required careful planning of the migration, development of migration tools, changes to the development model, and redefinition of the release procedures. Moreover we had to support a hybrid situation with some software projects hosted in Git and others still in Subversion, or even branches of one projects hosted in different systems. We present the way we addressed the special LHCb requirements, the technical details of migrating large non standard Subversion repositories, and how we managed to smoothly migrate the software projects following the schedule of each project manager.
An Open-source Community Web Site To Support Ground-Water Model Testing
NASA Astrophysics Data System (ADS)
Kraemer, S. R.; Bakker, M.; Craig, J. R.
2007-12-01
A community wiki wiki web site has been created as a resource to support ground-water model development and testing. The Groundwater Gourmet wiki is a repository for user supplied analytical and numerical recipes, howtos, and examples. Members are encouraged to submit analytical solutions, including source code and documentation. A diversity of code snippets are sought in a variety of languages, including Fortran, C, C++, Matlab, Python. In the spirit of a wiki, all contributions may be edited and altered by other users, and open source licensing is promoted. Community accepted contributions are graduated into the library of analytic solutions and organized into either a Strack (Groundwater Mechanics, 1989) or Bruggeman (Analytical Solutions of Geohydrological Problems, 1999) classification. The examples section of the wiki are meant to include laboratory experiments (e.g., Hele Shaw), classical benchmark problems (e.g., Henry Problem), and controlled field experiments (e.g., Borden landfill and Cape Cod tracer tests). Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
SU-E-T-103: Development and Implementation of Web Based Quality Control Software
DOE Office of Scientific and Technical Information (OSTI.GOV)
Studinski, R; Taylor, R; Angers, C
Purpose: Historically many radiation medicine programs have maintained their Quality Control (QC) test results in paper records or Microsoft Excel worksheets. Both these approaches represent significant logistical challenges, and are not predisposed to data review and approval. It has been our group's aim to develop and implement web based software designed not just to record and store QC data in a centralized database, but to provide scheduling and data review tools to help manage a radiation therapy clinics Equipment Quality control program. Methods: The software was written in the Python programming language using the Django web framework. In order tomore » promote collaboration and validation from other centres the code was made open source and is freely available to the public via an online source code repository. The code was written to provide a common user interface for data entry, formalize the review and approval process, and offer automated data trending and process control analysis of test results. Results: As of February 2014, our installation of QAtrack+ has 180 tests defined in its database and has collected ∼22 000 test results, all of which have been reviewed and approved by a physicist via QATrack+'s review tools. These results include records for quality control of Elekta accelerators, CT simulators, our brachytherapy programme, TomoTherapy and Cyberknife units. Currently at least 5 other centres are known to be running QAtrack+ clinically, forming the start of an international user community. Conclusion: QAtrack+ has proven to be an effective tool for collecting radiation therapy QC data, allowing for rapid review and trending of data for a wide variety of treatment units. As free and open source software, all source code, documentation and a bug tracker are available to the public at https://bitbucket.org/tohccmedphys/qatrackplus/.« less
Sustaining Open Source Communities through Hackathons - An Example from the ASPECT Community
NASA Astrophysics Data System (ADS)
Heister, T.; Hwang, L.; Bangerth, W.; Kellogg, L. H.
2016-12-01
The ecosystem surrounding a successful scientific open source software package combines both social and technical aspects. Much thought has been given to the technology side of writing sustainable software for large infrastructure projects and software libraries, but less about building the human capacity to perpetuate scientific software used in computational modeling. One effective format for building capacity is regular multi-day hackathons. Scientific hackathons bring together a group of science domain users and scientific software contributors to make progress on a specific software package. Innovation comes through the chance to work with established and new collaborations. Especially in the domain sciences with small communities, hackathons give geographically distributed scientists an opportunity to connect face-to-face. They foster lively discussions amongst scientists with different expertise, promote new collaborations, and increase transparency in both the technical and scientific aspects of code development. ASPECT is an open source, parallel, extensible finite element code to simulate thermal convection, that began development in 2011 under the Computational Infrastructure for Geodynamics. ASPECT hackathons for the past 3 years have grown the number of authors to >50, training new code maintainers in the process. Hackathons begin with leaders establishing project-specific conventions for development, demonstrating the workflow for code contributions, and reviewing relevant technical skills. Each hackathon expands the developer community. Over 20 scientists add >6,000 lines of code during the >1 week event. Participants grow comfortable contributing to the repository and over half continue to contribute afterwards. A high return rate of participants ensures continuity and stability of the group as well as mentoring for novice members. We hope to build other software communities on this model, but anticipate each to bring their own unique challenges.
NASA Astrophysics Data System (ADS)
Müller, W.; Alkan, H.; Xie, M.; Moog, H.; Sonnenthal, E. L.
2009-12-01
The release and migration of toxic contaminants from the disposed wastes is one of the main issues in long-term safety assessment of geological repositories. In the engineered and geological barriers around the nuclear waste emplacements chemical interactions between the components of the system may affect the isolation properties considerably. As the chemical issues change the transport properties in the near and far field of a nuclear repository, modelling of the transport should also take the chemistry into account. The reactive transport modelling consists of two main components: a code that combines the possible chemical reactions with thermo-hydrogeological processes interactively and a thermodynamic databank supporting the required parameters for the calculation of the chemical reactions. In the last decade many thermo-hydrogeological codes were upgraded to include the modelling of the chemical processes. TOUGHREACT is one of these codes. This is an extension of the well known simulator TOUGH2 for modelling geoprocesses. The code is developed by LBNL (Lawrence Berkeley National Laboratory, Univ. of California) for the simulation of the multi-phase transport of gas and liquid in porous media including heat transfer. After the release of its first version in 1998, this code has been applied and improved many times in conjunction with considerations for nuclear waste emplacement. A recent version has been extended to calculate ion activities in concentrated salt solutions applying the Pitzer model. In TOUGHREACT, the incorporated equation of state module ECO2N is applied as the EOS module for non-isothermal multiphase flow in a fluid system of H2O-NaCl-CO2. The partitioning of H2O and CO2 between liquid and gas phases is modelled as a function of temperature, pressure, and salinity. This module is applicable for waste repositories being expected to generate or having originally CO2 in the fluid system. The enhanced TOUGHREACT uses an EQ3/6-formatted database for both Pitzer ion-interaction parameters and thermodynamic equilibrium constants. The reliability of the parameters is as important as the accuracy of the modelling tool. For this purpose the project THEREDA (www.thereda.de)was set up. The project aims at a comprehensive and internally consistent thermodynamic reference database for geochemical modelling of near and far-field processes occurring in repositories for radioactive wastes in various host rock formations. In the framework of the project all data necessary to perform thermodynamic equilibrium calculations for elevated temperature in the system of oceanic salts are under revision, and it is expected that related data will be available for download by 2010-03. In this paper the geochemical issues that can play an essential role for the transport of radioactive contaminants within and around waste repositories are discussed. Some generic calculations are given to illustrate the geochemical interactions and their probable effects on the transport properties around HLW emplacements and on CO2 generating and/or containing repository systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferrada, J.J.
This report compiles preliminary information that supports the premise that a repository is needed in Latin America and analyzes the nuclear situation (mainly in Argentina and Brazil) in terms of nuclear capabilities, inventories, and regional spent-fuel repositories. The report is based on several sources and summarizes (1) the nuclear capabilities in Latin America and establishes the framework for the need of a permanent repository, (2) the International Atomic Energy Agency (IAEA) approach for a regional spent-fuel repository and describes the support that international institutions are lending to this issue, (3) the current situation in Argentina in order to analyze themore » Argentinean willingness to find a location for a deep geological repository, and (4) the issues involved in selecting a location for the repository and identifies a potential location. This report then draws conclusions based on an analysis of this information. The focus of this report is mainly on spent fuel and does not elaborate on other radiological waste sources.« less
Limitations to the use of two-dimensional thermal modeling of a nuclear waste repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davis, B.W.
1979-01-04
Thermal modeling of a nuclear waste repository is basic to most waste management predictive models. It is important that the modeling techniques accurately determine the time-dependent temperature distribution of the waste emplacement media. Recent modeling studies show that the time-dependent temperature distribution can be accurately modeled in the far-field using a 2-dimensional (2-D) planar numerical model; however, the near-field cannot be modeled accurately enough by either 2-D axisymmetric or 2-D planar numerical models for repositories in salt. The accuracy limits of 2-D modeling were defined by comparing results from 3-dimensional (3-D) TRUMP modeling with results from both 2-D axisymmetric andmore » 2-D planar. Both TRUMP and ADINAT were employed as modeling tools. Two-dimensional results from the finite element code, ADINAT were compared with 2-D results from the finite difference code, TRUMP; they showed almost perfect correspondence in the far-field. This result adds substantially to confidence in future use of ADINAT and its companion stress code ADINA for thermal stress analysis. ADINAT was found to be somewhat sensitive to time step and mesh aspect ratio. 13 figures, 4 tables.« less
The Future of ECHO: Evaluating Open Source Possibilities
NASA Astrophysics Data System (ADS)
Pilone, D.; Gilman, J.; Baynes, K.; Mitchell, A. E.
2012-12-01
NASA's Earth Observing System ClearingHOuse (ECHO) is a format agnostic metadata repository supporting over 3000 collections and 100M science granules. ECHO exposes FTP and RESTful Data Ingest APIs in addition to both SOAP and RESTful search and order capabilities. Built on top of ECHO is a human facing search and order web application named Reverb. ECHO processes hundreds of orders, tens of thousands of searches, and 1-2M ingest actions each week. As ECHO's holdings, metadata format support, and visibility have increased, the ECHO team has received requests by non-NASA entities for copies of ECHO that can be run locally against their data holdings. ESDIS and the ECHO Team have begun investigations into various deployment and Open Sourcing models that can balance the real constraints faced by the ECHO project with the benefits of providing ECHO capabilities to a broader set of users and providers. This talk will discuss several release and Open Source models being investigated by the ECHO team along with the impacts those models are expected to have on the project. We discuss: - Addressing complex deployment or setup issues for potential users - Models of vetting code contributions - Balancing external (public) user requests versus our primary partners - Preparing project code for public release, including navigating licensing issues related to leveraged libraries - Dealing with non-free project dependencies such as commercial databases - Dealing with sensitive aspects of project code such as database passwords, authentication approaches, security through obscurity, etc. - Ongoing support for the released code including increased testing demands, bug fixes, security fixes, and new features.
BioBlend: automating pipeline analyses within Galaxy and CloudMan.
Sloggett, Clare; Goonasekera, Nuwan; Afgan, Enis
2013-07-01
We present BioBlend, a unified API in a high-level language (python) that wraps the functionality of Galaxy and CloudMan APIs. BioBlend makes it easy for bioinformaticians to automate end-to-end large data analysis, from scratch, in a way that is highly accessible to collaborators, by allowing them to both provide the required infrastructure and automate complex analyses over large datasets within the familiar Galaxy environment. http://bioblend.readthedocs.org/. Automated installation of BioBlend is available via PyPI (e.g. pip install bioblend). Alternatively, the source code is available from the GitHub repository (https://github.com/afgane/bioblend) under the MIT open source license. The library has been tested and is working on Linux, Macintosh and Windows-based systems.
Apollo: a community resource for genome annotation editing
Ed, Lee; Nomi, Harris; Mark, Gibson; Raymond, Chetty; Suzanna, Lewis
2009-01-01
Summary: Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Availability: Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo. Contact: elee@berkeleybop.org PMID:19439563
Apollo: a community resource for genome annotation editing.
Lee, Ed; Harris, Nomi; Gibson, Mark; Chetty, Raymond; Lewis, Suzanna
2009-07-15
Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo.
Scaling an expert system data mart: more facilities in real-time.
McNamee, L A; Launsby, B D; Frisse, M E; Lehmann, R; Ebker, K
1998-01-01
Clinical Data Repositories are being rapidly adopted by large healthcare organizations as a method of centralizing and unifying clinical data currently stored in diverse and isolated information systems. Once stored in a clinical data repository, healthcare organizations seek to use this centralized data to store, analyze, interpret, and influence clinical care, quality and outcomes. A recent trend in the repository field has been the adoption of data marts--specialized subsets of enterprise-wide data taken from a larger repository designed specifically to answer highly focused questions. A data mart exploits the data stored in the repository, but can use unique structures or summary statistics generated specifically for an area of study. Thus, data marts benefit from the existence of a repository, are less general than a repository, but provide more effective and efficient support for an enterprise-wide data analysis task. In previous work, we described the use of batch processing for populating data marts directly from legacy systems. In this paper, we describe an architecture that uses both primary data sources and an evolving enterprise-wide clinical data repository to create real-time data sources for a clinical data mart to support highly specialized clinical expert systems.
Distributed databases for materials study of thermo-kinetic properties
NASA Astrophysics Data System (ADS)
Toher, Cormac
2015-03-01
High-throughput computational materials science provides researchers with the opportunity to rapidly generate large databases of materials properties. To rapidly add thermal properties to the AFLOWLIB consortium and Materials Project repositories, we have implemented an automated quasi-harmonic Debye model, the Automatic GIBBS Library (AGL). This enables us to screen thousands of materials for thermal conductivity, bulk modulus, thermal expansion and related properties. The search and sort functions of the online database can then be used to identify suitable materials for more in-depth study using more precise computational or experimental techniques. AFLOW-AGL source code is public domain and will soon be released within the GNU-GPL license.
Samorì, Bruno; Zuccheri, Giampaolo
2005-02-11
The nanometer scale is a special place where all sciences meet and develop a particularly strong interdisciplinarity. While biology is a source of inspiration for nanoscientists, chemistry has a central role in turning inspirations and methods from biological systems to nanotechnological use. DNA is the biological molecule by which nanoscience and nanotechnology is mostly fascinated. Nature uses DNA not only as a repository of the genetic information, but also as a controller of the expression of the genes it contains. Thus, there are codes embedded in the DNA sequence that serve to control recognition processes on the atomic scale, such as the base pairing, and others that control processes taking place on the nanoscale. From the chemical point of view, DNA is the supramolecular building block with the highest informational content. Nanoscience has therefore the opportunity of using DNA molecules to increase the level of complexity and efficiency in self-assembling and self-directing processes.
WFIRST: Data/Instrument Simulation Support at IPAC
NASA Astrophysics Data System (ADS)
Laine, Seppo; Akeson, Rachel; Armus, Lee; Bennett, Lee; Colbert, James; Helou, George; Kirkpatrick, J. Davy; Meshkat, Tiffany; Paladini, Roberta; Ramirez, Solange; Wang, Yun; Xie, Joan; Yan, Lin
2018-01-01
As part of WFIRST Science Center preparations, the IPAC Science Operations Center (ISOC) maintains a repository of 1) WFIRST data and instrument simulations, 2) tools to facilitate scientific performance and feasibility studies using the WFIRST, and 3) parameters summarizing the current design and predicted performance of the WFIRST telescope and instruments. The simulation repository provides access for the science community to simulation code, tools, and resulting analyses. Examples of simulation code with ISOC-built web-based interfaces include EXOSIMS (for estimating exoplanet yields in CGI surveys) and the Galaxy Survey Exposure Time Calculator. In the future the repository will provide an interface for users to run custom simulations of a wide range of coronagraph instrument (CGI) observations and sophisticated tools for designing microlensing experiments. We encourage those who are generating simulations or writing tools for exoplanet observations with WFIRST to contact the ISOC team so we can work with you to bring these to the attention of the broader astronomical community as we prepare for the exciting science that will be enabled by WFIRST.
NASA Astrophysics Data System (ADS)
Burgasser, Adam
The NASA Infrared Telescope Facility's (IRTF) SpeX spectrograph has been an essential tool in the discovery and characterization of ultracool dwarf (UCD) stars, brown dwarfs and exoplanets. Over ten years of SpeX data have been collected on these sources, and a repository of low-resolution (R 100) SpeX prism spectra has been maintained by the PI at the SpeX Prism Spectral Libraries website since 2008. As the largest existing collection of NIR UCD spectra, this repository has facilitated a broad range of investigations in UCD, exoplanet, Galactic and extragalactic science, contributing to over 100 publications in the past 6 years. However, this repository remains highly incomplete, has not been uniformly calibrated, lacks sufficient contextual data for observations and sources, and most importantly provides no data visualization or analysis tools for the user. To fully realize the scientific potential of these data for community research, we propose a two-year program to (1) calibrate and expand existing repository and archival data, and make it virtual-observatory compliant; (2) serve the data through a searchable web archive with basic visualization tools; and (3) develop and distribute an open-source, Python-based analysis toolkit for users to analyze the data. These resources will be generated through an innovative, student-centered research model, with undergraduate and graduate students building and validating the analysis tools through carefully designed coding challenges and research validation activities. The resulting data archive, the SpeX Prism Library, will be a legacy resource for IRTF and SpeX, and will facilitate numerous investigations using current and future NASA capabilities. These include deep/wide surveys of UCDs to measure Galactic structure and chemical evolution, and probe UCD populations in satellite galaxies (e.g., JWST, WFIRST); characterization of directly imaged exoplanet spectra (e.g., FINESSE), and development of low-temperature theoretical models of UCD and exoplanet atmospheres. Our program will also serve to validate the IRTF data archive during its development, by reducing and disseminating non-proprietary archival observations of UCDs to the community. The proposed program directly addresses NASA's strategic goals of exploring the origin and evolution of stars and planets that make up our universe, and discovering and studying planets around other stars.
76 FR 81950 - Privacy Act; System of Records
Federal Register 2010, 2011, 2012, 2013, 2014
2011-12-29
... ``Consolidated Data Repository'' (09-90-1000). This system of records is being amended to include records... Repository'' (SORN 09-90-1000). OIG is adding record sources to the system. This system fulfills our..., and investigations of the Medicare and Medicaid programs. SYSTEM NAME: Consolidated Data Repository...
10 CFR 60.22 - Filing and distribution of application.
Code of Federal Regulations, 2010 CFR
2010-01-01
... GEOLOGIC REPOSITORIES Licenses License Applications § 60.22 Filing and distribution of application. (a) An application for a construction authorization for a high-level radioactive waste repository at a geologic repository operations area, and an application for a license to receive and possess source, special nuclear...
A perspective on the proliferation risks of plutonium mines
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lyman, E.S.
1996-05-01
The program of geologic disposal of spent fuel and other plutonium-containing materials is increasingly becoming the target of criticism by individuals who argue that in the future, repositories may become low-cost sources of fissile material for nuclear weapons. This paper attempts to outline a consistent framework for analyzing the proliferation risks of these so-called {open_quotes}plutonium mines{close_quotes} and putting them into perspective. First, it is emphasized that the attractiveness of plutonium in a repository as a source of weapons material depends on its accessibility relative to other sources of fissile material. Then, the notion of a {open_quotes}material production standard{close_quotes} (MPS) ismore » proposed: namely, that the proliferation risks posed by geologic disposal will be acceptable if one can demonstrate, under a number of reasonable scenarios, that the recovery of plutonium from a repository is likely to be as difficult as new production of fissile material. A preliminary analysis suggests that the range of circumstances under which current mined repository concepts would fail to meet this standard is fairly narrow. Nevertheless, a broad application of the MPS may impose severe restrictions on repository design. In this context, the relationship of repository design parameters to easy of recovery is discussed.« less
Development of Pflotran Code for Waste Isolation Pilot Plant Performance Assessment
NASA Astrophysics Data System (ADS)
Zeitler, T.; Day, B. A.; Frederick, J.; Hammond, G. E.; Kim, S.; Sarathi, R.; Stein, E.
2017-12-01
The Waste Isolation Pilot Plant (WIPP) has been developed by the U.S. Department of Energy (DOE) for the geologic (deep underground) disposal of transuranic (TRU) waste. Containment of TRU waste at the WIPP is regulated by the U.S. Environmental Protection Agency (EPA). The DOE demonstrates compliance with the containment requirements by means of performance assessment (PA) calculations. WIPP PA calculations estimate the probability and consequence of potential radionuclide releases from the repository to the accessible environment for a regulatory period of 10,000 years after facility closure. The long-term performance of the repository is assessed using a suite of sophisticated computational codes. There is a current effort to enhance WIPP PA capabilities through the further development of the PFLOTRAN software, a state-of-the-art massively parallel subsurface flow and reactive transport code. Benchmark testing of the individual WIPP-specific process models implemented in PFLOTRAN (e.g., gas generation, chemistry, creep closure, actinide transport, and waste form) has been performed, including results comparisons for PFLOTRAN and existing WIPP PA codes. Additionally, enhancements to the subsurface hydrologic flow mode have been made. Repository-scale testing has also been performed for the modified PFLTORAN code and detailed results will be presented. Ultimately, improvements to the current computational environment will result in greater detail and flexibility in the repository model due to a move from a two-dimensional calculation grid to a three-dimensional representation. The result of the effort will be a state-of-the-art subsurface flow and transport capability that will serve WIPP PA into the future for use in compliance recertification applications (CRAs) submitted to the EPA. Sandia National Laboratories is a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA0003525. This research is funded by WIPP programs administered by the Office of Environmental Management (EM) of the U.S. Department of Energy.SAND2017-8198A.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bowman, A.W.
1990-04-01
This paper describes an approach to solve air quality problems which frequently occur during iterations of the baseline change process. From a schedule standpoint, it is desirable to perform this evaluation in as short a time as possible while budgetary pressures limit the size of the staff available to do the work. Without a method in place to deal with baseline change proposal requests the environment analysts may not be able to produce the analysis results in the time frame expected. Using a concept called the Rapid Response Air Quality Analysis System (RAAS), the problems of timing and cost becomemore » tractable. The system could be adapted to assess other atmospheric pathway impacts, e.g., acoustics or visibility. The air quality analysis system used to perform the EA analysis (EA) for the Salt Repository Project (part of the Civilian Radioactive Waste Management Program), and later to evaluate the consequences of proposed baseline changes, consists of three components: Emission source data files; Emission rates contained in spreadsheets; Impact assessment model codes. The spreadsheets contain user-written codes (macros) that calculate emission rates from (1) emission source data (e.g., numbers and locations of sources, detailed operating schedules, and source specifications including horsepower, load factor, and duty cycle); (2) emission factors such as those published by the U.S. Environmental Protection Agency, and (3) control efficiencies.« less
An ontology based information system for the management of institutional repository's collections
NASA Astrophysics Data System (ADS)
Tsolakidis, A.; Kakoulidis, P.; Skourlas, C.
2015-02-01
In this paper we discuss a simple methodological approach to create, and customize institutional repositories for the domain of the technological education. The use of the open source software platform of DSpace is proposed to build up the repository application and provide access to digital resources including research papers, dissertations, administrative documents, educational material, etc. Also the use of owl ontologies is proposed for indexing and accessing the various, heterogeneous items stored in the repository. Customization and operation of a platform for the selection and use of terms or parts of similar existing owl ontologies is also described. This platform could be based on the open source software Protégé that supports owl, is widely used, and also supports visualization, SPARQL etc. The combined use of the owl platform and the DSpace repository form a basis for creating customized ontologies, accommodating the semantic metadata of items and facilitating searching.
10 CFR 60.41 - Standards for issuance of a license.
Code of Federal Regulations, 2010 CFR
2010-01-01
... REPOSITORIES Licenses License Issuance and Amendment § 60.41 Standards for issuance of a license. A license to receive and possess source, special nuclear, or byproduct material at a geologic repository operations area may be issued by the Commission upon finding that: (a) Construction of the geologic repository...
10 CFR 60.44 - Changes, tests, and experiments.
Code of Federal Regulations, 2010 CFR
2010-01-01
... REPOSITORIES Licenses License Issuance and Amendment § 60.44 Changes, tests, and experiments. (a)(1) Following authorization to receive and possess source, special nuclear, or byproduct material at a geologic repository operations area, the DOE may (i) make changes in the geologic repository operations area as described in the...
NASA Astrophysics Data System (ADS)
Jara, Daniel; de Dreuzy, Jean-Raynald; Cochepin, Benoit
2017-12-01
Reactive transport modeling contributes to understand geophysical and geochemical processes in subsurface environments. Operator splitting methods have been proposed as non-intrusive coupling techniques that optimize the use of existing chemistry and transport codes. In this spirit, we propose a coupler relying on external geochemical and transport codes with appropriate operator segmentation that enables possible developments of additional splitting methods. We provide an object-oriented implementation in TReacLab developed in the MATLAB environment in a free open source frame with an accessible repository. TReacLab contains classical coupling methods, template interfaces and calling functions for two classical transport and reactive software (PHREEQC and COMSOL). It is tested on four classical benchmarks with homogeneous and heterogeneous reactions at equilibrium or kinetically-controlled. We show that full decoupling to the implementation level has a cost in terms of accuracy compared to more integrated and optimized codes. Use of non-intrusive implementations like TReacLab are still justified for coupling independent transport and chemical software at a minimal development effort but should be systematically and carefully assessed.
The NCAR Digital Asset Services Hub (DASH): Implementing Unified Data Discovery and Access
NASA Astrophysics Data System (ADS)
Stott, D.; Worley, S. J.; Hou, C. Y.; Nienhouse, E.
2017-12-01
The National Center for Atmospheric Research (NCAR) Directorate created the Data Stewardship Engineering Team (DSET) to plan and implement an integrated single entry point for uniform digital asset discovery and access across the organization in order to improve the efficiency of access, reduce the costs, and establish the foundation for interoperability with other federated systems. This effort supports new policies included in federal funding mandates, NSF data management requirements, and journal citation recommendations. An inventory during the early planning stage identified diverse asset types across the organization that included publications, datasets, metadata, models, images, and software tools and code. The NCAR Digital Asset Services Hub (DASH) is being developed and phased in this year to improve the quality of users' experiences in finding and using these assets. DASH serves to provide engagement, training, search, and support through the following four nodes (see figure). DASH MetadataDASH provides resources for creating and cataloging metadata to the NCAR Dialect, a subset of ISO 19115. NMDEdit, an editor based on a European open source application, has been configured for manual entry of NCAR metadata. CKAN, an open source data portal platform, harvests these XML records (along with records output directly from databases) from a Web Accessible Folder (WAF) on GitHub for validation. DASH SearchThe NCAR Dialect metadata drives cross-organization search and discovery through CKAN, which provides the display interface of search results. DASH search will establish interoperability by facilitating metadata sharing with other federated systems. DASH ConsultingThe DASH Data Curation & Stewardship Coordinator assists with Data Management (DM) Plan preparation and advises on Digital Object Identifiers. The coordinator arranges training sessions on the DASH metadata tools and DM planning, and provides one-on-one assistance as requested. DASH RepositoryA repository is under development for NCAR datasets currently not in existing lab-managed archives. The DASH repository will be under NCAR governance and meet Trustworthy Repositories Audit & Certification (TRAC) requirements. This poster will highlight the processes, lessons learned, and current status of the DASH effort at NCAR.
Preparing for the Downsizing and Closure of Letterman Army Medical Center: A Case Study
1991-06-17
and closure of Lieutenant Colonel F. William Brown believed in the value of this project, encouraged , and guided me during conceptualization , design...issues dirocled Sn the RW docnent repository were coded within this framwork . The muiaion category was coded 1 if primary or secmonay care waM affected
Continuous integration for concurrent MOOSE framework and application development on GitHub
Slaughter, Andrew E.; Peterson, John W.; Gaston, Derek R.; ...
2015-11-20
For the past several years, Idaho National Laboratory’s MOOSE framework team has employed modern software engineering techniques (continuous integration, joint application/framework source code repos- itories, automated regression testing, etc.) in developing closed-source multiphysics simulation software (Gaston et al., Journal of Open Research Software vol. 2, article e10, 2014). In March 2014, the MOOSE framework was released under an open source license on GitHub, significantly expanding and diversifying the pool of current active and potential future contributors on the project. Despite this recent growth, the same philosophy of concurrent framework and application development continues to guide the project’s development roadmap. Severalmore » specific practices, including techniques for managing multiple repositories, conducting automated regression testing, and implementing a cascading build process are discussed in this short paper. Furthermore, special attention is given to describing the manner in which these practices naturally synergize with the GitHub API and GitHub-specific features such as issue tracking, Pull Requests, and project forks.« less
Continuous integration for concurrent MOOSE framework and application development on GitHub
DOE Office of Scientific and Technical Information (OSTI.GOV)
Slaughter, Andrew E.; Peterson, John W.; Gaston, Derek R.
For the past several years, Idaho National Laboratory’s MOOSE framework team has employed modern software engineering techniques (continuous integration, joint application/framework source code repos- itories, automated regression testing, etc.) in developing closed-source multiphysics simulation software (Gaston et al., Journal of Open Research Software vol. 2, article e10, 2014). In March 2014, the MOOSE framework was released under an open source license on GitHub, significantly expanding and diversifying the pool of current active and potential future contributors on the project. Despite this recent growth, the same philosophy of concurrent framework and application development continues to guide the project’s development roadmap. Severalmore » specific practices, including techniques for managing multiple repositories, conducting automated regression testing, and implementing a cascading build process are discussed in this short paper. Furthermore, special attention is given to describing the manner in which these practices naturally synergize with the GitHub API and GitHub-specific features such as issue tracking, Pull Requests, and project forks.« less
NASA Astrophysics Data System (ADS)
Harman, C. J.
2015-12-01
Even amongst the academic community, new theoretical tools can remain underutilized due to the investment of time and resources required to understand and implement them. This surely limits the frequency that new theory is rigorously tested against data by scientists outside the group that developed it, and limits the impact that new tools could have on the advancement of science. Reducing the barriers to adoption through online education and open-source code can bridge the gap between theory and data, forging new collaborations, and advancing science. A pilot venture aimed at increasing the adoption of a new theory of time-variable transit time distributions was begun in July 2015 as a collaboration between Johns Hopkins University and The Consortium of Universities for the Advancement of Hydrologic Science (CUAHSI). There were four main components to the venture: a public online seminar covering the theory, an open source code repository, a virtual short course designed to help participants apply the theory to their data, and an online forum to maintain discussion and build a community of users. 18 participants were selected for the non-public components based on their responses in an application, and were asked to fill out a course evaluation at the end of the short course, and again several months later. These evaluations, along with participation in the forum and on-going contact with the organizer suggest strengths and weaknesses in this combination of components to assist participants in adopting new tools.
The Global Registry of Biodiversity Repositories: A Call for Community Curation.
Schindel, David E; Miller, Scott E; Trizna, Michael G; Graham, Eileen; Crane, Adele E
2016-01-01
The Global Registry of Biodiversity Repositories is an online metadata resource for biodiversity collections, the institutions that contain them, and associated staff members. The registry provides contact and address information, characteristics of the institutions and collections using controlled vocabularies and free-text descripitons, links to related websites, unique identifiers for each institution and collection record, text fields for loan and use policies, and a variety of other descriptors. Each institution record includes an institutionCode that must be unique, and each collection record must have a collectionCode that is unique within that institution. The registry is populated with records imported from the largest similar registries and more can be harmonized and added. Doing so will require community input and curation and would produce a truly comprehensive and unifying information resource.
The Athabasca University eduSource Project: Building an Accessible Learning Object Repository
ERIC Educational Resources Information Center
Cleveland-Innes, Martha; McGreal, Rory; Anderson, Terry; Friesen, Norm; Ally, Mohamed; Tin, Tony; Graham, Rodger; Moisey, Susan; Petrinjak, Anita; Schafer, Steve
2005-01-01
Athabasca University--Canada's Open University (AU) made the commitment to put all of its courses online as part of its Strategic University Plan. In pursuit of this goal, AU participated in the eduSource project, a pan-Canadian effort to build the infrastructure for an interoperable network of learning object repositories. AU acted as a leader in…
Basaltic Dike Propagation at Yucca Mountain, Nevada, USA
NASA Astrophysics Data System (ADS)
Gaffney, E. S.; Damjanac, B.; Warpinski, N. R.
2004-12-01
We describe simulations of the propagation of basaltic dikes using a 2-dimensional, incompressible hydrofracture code including the effects of the free surface with specific application to potential interactions of rising magma with a nuclear waste repository at Yucca Mountain, Nevada. As the leading edge of the dike approaches the free surface, confinement at the crack tip is reduced and the tip accelerates relative to the magma front. In the absence of either excess confining stress or excess gas pressure in the tip cavity, this leads to an increase of crack-tip velocity by more than an order of magnitude. By casting the results in nondimensional form, they can be applied to a wide variety of intrusive situations. When applied to an alkali basalt intrusion at the proposed high-level nuclear waste repository at Yucca Mountain, the results provide for a description of the subsurface phenomena. For magma rising at 1 m/s and dikes wider than about 0.5 m, the tip of the fissure would already have breached the surface by the time magma arrived at the nominal 300-m repository depth. An approximation of the effect of magma expansion on dike propagation is used to show that removing the restriction of an incompressible magma would result in even greater crack-tip acceleration as the dike approached the surface. A second analysis with a distinct element code indicates that a dike could penetrate the repository even during the first 2000 years after closure during which time heating from radioactive decay of waste would raise the minimum horizontal compressive stress above the vertical stress for about 80 m above and below the repository horizon. Rather than sill formation, the analysis indicates that increased pressure and dike width below the repository cause the crack tip to penetrate the horizon, but much more slowly than under in situ stress conditions. The analysis did not address the effects of either anisotropic joints or heat loss on this result.
LOINC, a universal standard for identifying laboratory observations: a 5-year update.
McDonald, Clement J; Huff, Stanley M; Suico, Jeffrey G; Hill, Gilbert; Leavelle, Dennis; Aller, Raymond; Forrey, Arden; Mercer, Kathy; DeMoor, Georges; Hook, John; Williams, Warren; Case, James; Maloney, Pat
2003-04-01
The Logical Observation Identifier Names and Codes (LOINC) database provides a universal code system for reporting laboratory and other clinical observations. Its purpose is to identify observations in electronic messages such as Health Level Seven (HL7) observation messages, so that when hospitals, health maintenance organizations, pharmaceutical manufacturers, researchers, and public health departments receive such messages from multiple sources, they can automatically file the results in the right slots of their medical records, research, and/or public health systems. For each observation, the database includes a code (of which 25 000 are laboratory test observations), a long formal name, a "short" 30-character name, and synonyms. The database comes with a mapping program called Regenstrief LOINC Mapping Assistant (RELMA(TM)) to assist the mapping of local test codes to LOINC codes and to facilitate browsing of the LOINC results. Both LOINC and RELMA are available at no cost from http://www.regenstrief.org/loinc/. The LOINC medical database carries records for >30 000 different observations. LOINC codes are being used by large reference laboratories and federal agencies, e.g., the CDC and the Department of Veterans Affairs, and are part of the Health Insurance Portability and Accountability Act (HIPAA) attachment proposal. Internationally, they have been adopted in Switzerland, Hong Kong, Australia, and Canada, and by the German national standards organization, the Deutsches Instituts für Normung. Laboratories should include LOINC codes in their outbound HL7 messages so that clinical and research clients can easily integrate these results into their clinical and research repositories. Laboratories should also encourage instrument vendors to deliver LOINC codes in their instrument outputs and demand LOINC codes in HL7 messages they get from reference laboratories to avoid the need to lump so many referral tests under the "send out lab" code.
Perspectives of the optical coherence tomography community on code and data sharing
NASA Astrophysics Data System (ADS)
Lurie, Kristen L.; Mistree, Behram F. T.; Ellerbee, Audrey K.
2015-03-01
As optical coherence tomography (OCT) grows to be a mature and successful field, it is important for the research community to develop a stronger practice of sharing code and data. A prolific culture of sharing can enable new and emerging laboratories to enter the field, allow research groups to gain new exposure and notoriety, and enable benchmarking of new algorithms and methods. Our long-term vision is to build tools to facilitate a stronger practice of sharing within this community. In line with this goal, our first aim was to understand the perceptions and practices of the community with respect to sharing research contributions (i.e., as code and data). We surveyed 52 members of the OCT community using an online polling system. Our main findings indicate that while researchers infrequently share their code and data, they are willing to contribute their research resources to a shared repository, and they believe that such a repository would benefit both their research and the OCT community at large. We plan to use the results of this survey to design a platform targeted to the OCT research community - an effort that ultimately aims to facilitate a more prolific culture of sharing.
NASA Astrophysics Data System (ADS)
Butkovich, T. R.
1981-08-01
A generic test of the geologic storage of spent-fuel assemblies from an operating nuclear reactor is being made by the Lawrence Livermore National Laboratory at the US Department of Energy's Nevada Test Site. The spent-fuel assemblies were emplaced at a depth of 420 m (1370 ft) below the surface in a typical granite and will be retrieved at a later time. The early time, close-in thermal history of this type of repository is being simulated with spent-fuel and electrically heated canisters in a central drift, with auxiliary heaters in two parallel side drifts. Prior to emplacement of the spent-fuel canister, preliminary calculations were made using a pair of existing finite-element codes. Calculational modeling of a spent-fuel repository requires a code with a multiple capability. The effects of both the mining operation and the thermal load on the existing stress fields and the resultant displacements of the rock around the repository must be calculated. The thermal loading for each point in the rock is affected by heat transfer through conduction, radiation, and normal convection, as well as by ventilation of the drifts. Both the ADINA stress code and the compatible ADINAT heat-flow code were used to perform the calculations because they satisfied the requirements of this project. ADINAT was adapted to calculate radiative and convective heat transfer across the drifts and to model the effects of ventilation in the drifts, while the existing isotropic elastic model was used with the ADINA code. The results of the calculation are intended to provide a base with which to compare temperature, stress, and displacement data taken during the planned 5-y duration of the test. In this way, it will be possible to determine how the existing jointing in the rock influences the results as compared with a homogeneous, isotropic rock mass. Later, new models will be introduced into ADINA to account for the effects of jointing.
NASA Technical Reports Server (NTRS)
Hanley, Lionel
1989-01-01
The Ada Software Repository is a public-domain collection of Ada software and information. The Ada Software Repository is one of several repositories located on the SIMTEL20 Defense Data Network host computer at White Sands Missile Range, and available to any host computer on the network since 26 November 1984. This repository provides a free source for Ada programs and information. The Ada Software Repository is divided into several subdirectories. These directories are organized by topic, and their names and a brief overview of their topics are contained. The Ada Software Repository on SIMTEL20 serves two basic roles: to promote the exchange and use (reusability) of Ada programs and tools (including components) and to promote Ada education.
The Global Registry of Biodiversity Repositories: A Call for Community Curation
Miller, Scott E.; Trizna, Michael G.; Graham, Eileen; Crane, Adele E.
2016-01-01
Abstract The Global Registry of Biodiversity Repositories is an online metadata resource for biodiversity collections, the institutions that contain them, and associated staff members. The registry provides contact and address information, characteristics of the institutions and collections using controlled vocabularies and free-text descripitons, links to related websites, unique identifiers for each institution and collection record, text fields for loan and use policies, and a variety of other descriptors. Each institution record includes an institutionCode that must be unique, and each collection record must have a collectionCode that is unique within that institution. The registry is populated with records imported from the largest similar registries and more can be harmonized and added. Doing so will require community input and curation and would produce a truly comprehensive and unifying information resource. PMID:27660523
THE SMALL BODY GEOPHYSICAL ANALYSIS TOOL
NASA Astrophysics Data System (ADS)
Bercovici, Benjamin; McMahon, Jay
2017-10-01
The Small Body Geophysical Analysis Tool (SBGAT) that we are developing aims at providing scientists and mission designers with a comprehensive, easy to use, open-source analysis tool. SBGAT is meant for seamless generation of valuable simulated data originating from small bodies shape models, combined with advanced shape-modification properties.The current status of SBGAT is as follows:The modular software architecture that was specified in the original SBGAT proposal was implemented in the form of two distinct packages: a dynamic library SBGAT Core containing the data structure and algorithm backbone of SBGAT, and SBGAT Gui which wraps the former inside a VTK, Qt user interface to facilitate user/data interaction. This modular development facilitates maintenance and addi- tion of new features. Note that SBGAT Core can be utilized independently from SBGAT Gui.SBGAT is presently being hosted on a GitHub repository owned by SBGAT’s main developer. This repository is public and can be accessed at https://github.com/bbercovici/SBGAT. Along with the commented code, one can find the code documentation at https://bbercovici.github.io/sbgat-doc/index.html. This code documentation is constently updated in order to reflect new functionalities.SBGAT’s user’s manual is available at https://github.com/bbercovici/SBGAT/wiki. This document contains a comprehensive tutorial indicating how to retrieve, compile and run SBGAT from scratch.Some of the upcoming development goals are listed hereafter. First, SBGAT's dynamics module will be extented: the PGM algorithm is the only type of analysis method currently implemented. Future work will therefore consists in broadening SBGAT’s capabilities with the Spherical Harmonics Expansion of the gravity field and the calculation of YORP coefficients. Second, synthetic measurements will soon be available within SBGAT. The software should be able to generate synthetic observations of different type (radar, lightcurve, point clouds,...) from the shape model currently manipulated. Finally, shape interaction capabilities will be added to SBGAT GUI, as it will be augmented with these functionalities using built-in VTK interaction methods.
Source term evaluation model for high-level radioactive waste repository with decay chain build-up.
Chopra, Manish; Sunny, Faby; Oza, R B
2016-09-18
A source term model based on two-component leach flux concept is developed for a high-level radioactive waste repository. The long-lived radionuclides associated with high-level waste may give rise to the build-up of activity because of radioactive decay chains. The ingrowths of progeny are incorporated in the model using Bateman decay chain build-up equations. The model is applied to different radionuclides present in the high-level radioactive waste, which form a part of decay chains (4n to 4n + 3 series), and the activity of the parent and daughter radionuclides leaching out of the waste matrix is estimated. Two cases are considered: one when only parent is present initially in the waste and another where daughters are also initially present in the waste matrix. The incorporation of in situ production of daughter radionuclides in the source is important to carry out realistic estimates. It is shown that the inclusion of decay chain build-up is essential to avoid underestimation of the radiological impact assessment of the repository. The model can be a useful tool for evaluating the source term of the radionuclide transport models used for the radiological impact assessment of high-level radioactive waste repositories.
Repository contributions to Rubus research
USDA-ARS?s Scientific Manuscript database
The USDA National Plant Germplasm System is a nation-wide source for global genetic resources. The National Clonal Germplasm Repository (NCGR) in Corvallis, OR, maintains crops and crop wild relatives for the Willamette Valley including pear, raspberry and blackberry, strawberry, blueberry, gooseber...
NASA Astrophysics Data System (ADS)
Nijssen, B.; Hamman, J.; Bohn, T. J.
2015-12-01
The Variable Infiltration Capacity (VIC) model is a macro-scale semi-distributed hydrologic model. VIC development began in the early 1990s and it has been used extensively, applied from basin to global scales. VIC has been applied in a many use cases, including the construction of hydrologic data sets, trend analysis, data evaluation and assimilation, forecasting, coupled climate modeling, and climate change impact analysis. Ongoing applications of the VIC model include the University of Washington's drought monitor and forecast systems, and NASA's land data assimilation systems. The development of VIC version 5.0 focused on reconfiguring the legacy VIC source code to support a wider range of modern modeling applications. The VIC source code has been moved to a public Github repository to encourage participation by the model development community-at-large. The reconfiguration has separated the physical core of the model from the driver, which is responsible for memory allocation, pre- and post-processing and I/O. VIC 5.0 includes four drivers that use the same physical model core: classic, image, CESM, and Python. The classic driver supports legacy VIC configurations and runs in the traditional time-before-space configuration. The image driver includes a space-before-time configuration, netCDF I/O, and uses MPI for parallel processing. This configuration facilitates the direct coupling of streamflow routing, reservoir, and irrigation processes within VIC. The image driver is the foundation of the CESM driver; which couples VIC to CESM's CPL7 and a prognostic atmosphere. Finally, we have added a Python driver that provides access to the functions and datatypes of VIC's physical core from a Python interface. This presentation demonstrates how reconfiguring legacy source code extends the life and applicability of a research model.
NASA Astrophysics Data System (ADS)
Brouwer, Albert; Brown, David; Tomuta, Elena
2017-04-01
To detect nuclear explosions, waveform data from over 240 SHI stations world-wide flows into the International Data Centre (IDC) of the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO), located in Vienna, Austria. A complex pipeline of software applications processes this data in numerous ways to form event hypotheses. The software codebase comprises over 2 million lines of code, reflects decades of development, and is subject to frequent enhancement and revision. Since processing must run continuously and reliably, software changes are subjected to thorough testing before being put into production. To overcome the limitations and cost of manual testing, the Continuous Automated Testing System (CATS) has been created. CATS provides an isolated replica of the IDC processing environment, and is able to build and test different versions of the pipeline software directly from code repositories that are placed under strict configuration control. Test jobs are scheduled automatically when code repository commits are made. Regressions are reported. We present the CATS design choices and test methods. Particular attention is paid to how the system accommodates the individual testing of strongly interacting software components that lack test instrumentation.
Rekadwad, Bhagwan N; Gonzalez, Juan M
2017-08-01
A report on 16S rRNA gene sequence re-analysis and digitalization is presented using Lysinibacillus species (one example) deposited in National Microbial Repositories in India. Lysinibacillus species 16S rRNA gene sequences were digitalized to provide quick response (QR) codes, Chaose Game Representation (CGR) and Frequency of Chaose Game Representation (FCGR). GC percentage, phylogenetic analysis, and principal component analysis (PCA) are tools used for the differentiation and reclassification of the strains under investigation. The seven reasons supporting the statements made by us as misclassified Lysinibacillus species deposited in National Microbial Depositories are given in this paper. Based on seven reasons, bacteria deposited in National Microbial Repositories such as Lysinibacillus and many other needs reanalyses for their exact identity. Leaves of identity with type strains of related species shows difference 2 to 8 % suggesting that reclassification is needed to correctly assign species names to the analyzed Lysinibacillus strains available in National Microbial Repositories.
Performance Assessments of Generic Nuclear Waste Repositories in Shale
NASA Astrophysics Data System (ADS)
Stein, E. R.; Sevougian, S. D.; Mariner, P. E.; Hammond, G. E.; Frederick, J.
2017-12-01
Simulations of deep geologic disposal of nuclear waste in a generic shale formation showcase Geologic Disposal Safety Assessment (GDSA) Framework, a toolkit for repository performance assessment (PA) whose capabilities include domain discretization (Cubit), multiphysics simulations (PFLOTRAN), uncertainty and sensitivity analysis (Dakota), and visualization (Paraview). GDSA Framework is used to conduct PAs of two generic repositories in shale. The first considers the disposal of 22,000 metric tons heavy metal of commercial spent nuclear fuel. The second considers disposal of defense-related spent nuclear fuel and high level waste. Each PA accounts for the thermal load and radionuclide inventory of applicable waste types, components of the engineered barrier system, and components of the natural barrier system including the host rock shale and underlying and overlying stratigraphic units. Model domains are half-symmetry, gridded with Cubit, and contain between 7 and 22 million grid cells. Grid refinement captures the detail of individual waste packages, emplacement drifts, access drifts, and shafts. Simulations are run in a high performance computing environment on as many as 2048 processes. Equations describing coupled heat and fluid flow and reactive transport are solved with PFLOTRAN, an open-source, massively parallel multiphase flow and reactive transport code. Additional simulated processes include waste package degradation, waste form dissolution, radioactive decay and ingrowth, sorption, solubility, advection, dispersion, and diffusion. Simulations are run to 106 y, and radionuclide concentrations are observed within aquifers at a point approximately 5 km downgradient of the repository. Dakota is used to sample likely ranges of input parameters including waste form and waste package degradation rates and properties of engineered and natural materials to quantify uncertainty in predicted concentrations and sensitivity to input parameters. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. SAND2017- 8305 A
The Tropical and Subtropical Germplasm Repositories of The National Germplasm System
USDA-ARS?s Scientific Manuscript database
Germplasm collections are viewed as a source of genetic diversity to support crop improvement and agricultural research, and germplasm conservation efforts. The United States Department of Agriculture's National Plant Germplasm Repository System (NPGS) is responsible for administering plant genetic ...
Fostering Team Awareness in Earth System Modeling Communities
NASA Astrophysics Data System (ADS)
Easterbrook, S. M.; Lawson, A.; Strong, S.
2009-12-01
Existing Global Climate Models are typically managed and controlled at a single site, with varied levels of participation by scientists outside the core lab. As these models evolve to encompass a wider set of earth systems, this central control of the modeling effort becomes a bottleneck. But such models cannot evolve to become fully distributed open source projects unless they address the imbalance in the availability of communication channels: scientists at the core site have access to regular face-to-face communication with one another, while those at remote sites have access to only a subset of these conversations - e.g. formally scheduled teleconferences and user meetings. Because of this imbalance, critical decision making can be hidden from many participants, their code contributions can interact in unanticipated ways, and the community loses awareness of who knows what. We have documented some of these problems in a field study at one climate modeling centre, and started to develop tools to overcome these problems. We report on one such tool, TracSNAP, which analyzes the social network of the scientists contributing code to the model by extracting the data in an existing project code repository. The tool presents the results of this analysis to modelers and model users in a number of ways: recommendation for who has expertise on particular code modules, suggestions for code sections that are related to files being worked on, and visualizations of team communication patterns. The tool is currently available as a plugin for the Trac bug tracking system.
NASA Astrophysics Data System (ADS)
Horowitz, F. G.; Gaede, O.
2014-12-01
Wavelet multiscale edge analysis of potential fields (a.k.a. "worms") has been known since Moreau et al. (1997) and was independently derived by Hornby et al. (1999). The technique is useful for producing a scale-explicit overview of the structures beneath a gravity or magnetic survey, including establishing the location and estimating the attitude of surface features, as well as incorporating information about the geometric class (point, line, surface, volume, fractal) of the underlying sources — in a fashion much like traditional structural indices from Euler solutions albeit with better areal coverage. Hornby et al. (2002) show that worms form the locally highest concentration of horizontal edges of a given strike — which in conjunction with the results from Mallat and Zhong (1992) induces a (non-unique!) inversion where the worms are physically interpretable as lateral boundaries in a source distribution that produces a close approximation of the observed potential field. The technique has enjoyed widespread adoption and success in the Australian mineral exploration community — including "ground truth" via successfully drilling structures indicated by the worms. Unfortunately, to our knowledge, all implementations of the code to calculate the worms/multiscale edges (including Horowitz' original research code) are either part of commercial software packages, or have copyright restrictions that impede the use of the technique by the wider community. The technique is completely described mathematically in Hornby et al. (1999) along with some later publications. This enables us to re-implement from scratch the code required to calculate and visualize the worms. We are freely releasing the results under an (open source) BSD two-clause software license. A git repository is available at
CIRMIS Data system. Volume 2. Program listings
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedrichs, D.R.
1980-01-01
The Assessment of Effectiveness of Geologic Isolation Systems (AEGIS) Program is developing and applying the methodology for assessing the far-field, long-term post-closure safety of deep geologic nuclear waste repositories. AEGIS is being performed by Pacific Northwest Laboratory (PNL) under contract with the Office of Nuclear Waste Isolation (OWNI) for the Department of Energy (DOE). One task within AEGIS is the development of methodology for analysis of the consequences (water pathway) from loss of repository containment as defined by various release scenarios. Analysis of the long-term, far-field consequences of release scenarios requires the application of numerical codes which simulate the hydrologicmore » systems, model the transport of released radionuclides through the hydrologic systems, model the transport of released radionuclides through the hydrologic systems to the biosphere, and, where applicable, assess the radiological dose to humans. The various input parameters required in the analysis are compiled in data systems. The data are organized and prepared by various input subroutines for utilization by the hydraulic and transport codes. The hydrologic models simulate the groundwater flow systems and provide water flow directions, rates, and velocities as inputs to the transport models. Outputs from the transport models are basically graphs of radionuclide concentration in the groundwater plotted against time. After dilution in the receiving surface-water body (e.g., lake, river, bay), these data are the input source terms for the dose models, if dose assessments are required.The dose models calculate radiation dose to individuals and populations. CIRMIS (Comprehensive Information Retrieval and Model Input Sequence) Data System is a storage and retrieval system for model input and output data, including graphical interpretation and display. This is the second of four volumes of the description of the CIRMIS Data System.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedrichs, D.R.
1980-01-01
The Assessment of Effectiveness of Geologic Isolation Systems (AEGIS) Program is developing and applying the methodology for assessing the far-field, long-term post-closure safety of deep geologic nuclear waste repositories. AEGIS is being performed by Pacific Northwest Laboratory (PNL) under contract with the Office of Nuclear Waste Isolation (ONWI) for the Department of Energy (DOE). One task within AEGIS is the development of methodology for analysis of the consequences (water pathway) from loss of repository containment as defined by various release scenarios. Analysis of the long-term, far-field consequences of release scenarios requires the application of numerical codes which simulate the hydrologicmore » systems, model the transport of released radionuclides through the hydrologic systems to the biosphere, and, where applicable, assess the radiological dose to humans. The various input parameters required in the analysis are compiled in data systems. The data are organized and prepared by various input subroutines for use by the hydrologic and transport codes. The hydrologic models simulate the groundwater flow systems and provide water flow directions, rates, and velocities as inputs to the transport models. Outputs from the transport models are basically graphs of radionuclide concentration in the groundwater plotted against time. After dilution in the receiving surface-water body (e.g., lake, river, bay), these data are the input source terms for the dose models, if dose assessments are required. The dose models calculate radiation dose to individuals and populations. CIRMIS (Comprehensive Information Retrieval and Model Input Sequence) Data System is a storage and retrieval system for model input and output data, including graphical interpretation and display. This is the fourth of four volumes of the description of the CIRMIS Data System.« less
Testability, Test Automation and Test Driven Development for the Trick Simulation Toolkit
NASA Technical Reports Server (NTRS)
Penn, John
2014-01-01
This paper describes the adoption of a Test Driven Development approach and a Continuous Integration System in the development of the Trick Simulation Toolkit, a generic simulation development environment for creating high fidelity training and engineering simulations at the NASA Johnson Space Center and many other NASA facilities. It describes the approach, and the significant benefits seen, such as fast, thorough and clear test feedback every time code is checked into the code repository. It also describes an approach that encourages development of code that is testable and adaptable.
Curatr: a web application for creating, curating and sharing a mass spectral library.
Palmer, Andrew; Phapale, Prasad; Fay, Dominik; Alexandrov, Theodore
2018-04-15
We have developed a web application curatr for the rapid generation of high quality mass spectral fragmentation libraries from liquid-chromatography mass spectrometry datasets. Curatr handles datasets from single or multiplexed standards and extracts chromatographic profiles and potential fragmentation spectra for multiple adducts. An intuitive interface helps users to select high quality spectra that are stored along with searchable molecular information, the providence of each standard and experimental metadata. Curatr supports exports to several standard formats for use with third party software or submission to repositories. We demonstrate the use of curatr to generate the EMBL Metabolomics Core Facility spectral library http://curatr.mcf.embl.de. Source code and example data are at http://github.com/alexandrovteam/curatr/. palmer@embl.de. Supplementary data are available at Bioinformatics online.
Cluster-lensing: A Python Package for Galaxy Clusters and Miscentering
NASA Astrophysics Data System (ADS)
Ford, Jes; VanderPlas, Jake
2016-12-01
We describe a new open source package for calculating properties of galaxy clusters, including Navarro, Frenk, and White halo profiles with and without the effects of cluster miscentering. This pure-Python package, cluster-lensing, provides well-documented and easy-to-use classes and functions for calculating cluster scaling relations, including mass-richness and mass-concentration relations from the literature, as well as the surface mass density {{Σ }}(R) and differential surface mass density {{Δ }}{{Σ }}(R) profiles, probed by weak lensing magnification and shear. Galaxy cluster miscentering is especially a concern for stacked weak lensing shear studies of galaxy clusters, where offsets between the assumed and the true underlying matter distribution can lead to a significant bias in the mass estimates if not accounted for. This software has been developed and released in a public GitHub repository, and is licensed under the permissive MIT license. The cluster-lensing package is archived on Zenodo. Full documentation, source code, and installation instructions are available at http://jesford.github.io/cluster-lensing/.
NASA Astrophysics Data System (ADS)
Dunagan, S. C.; Herrick, C. G.; Lee, M. Y.
2008-12-01
The Waste Isolation Pilot Plant (WIPP) is located at a depth of 655 m in bedded salt in southeastern New Mexico and is operated by the U.S. Department of Energy as a deep underground disposal facility for transuranic (TRU) waste. The WIPP must comply with the EPA's environmental regulations that require a probabilistic risk analysis of releases of radionuclides due to inadvertent human intrusion into the repository at some time during the 10,000-year regulatory period. Sandia National Laboratories conducts performance assessments (PAs) of the WIPP using a system of computer codes representing the evolution of underground repository and emplaced TRU waste in order to demonstrate compliance. One of the important features modeled in a PA is the disturbed rock zone (DRZ) surrounding the emplacement rooms in the repository. The extent and permeability of DRZ play a significant role in the potential radionuclide release scenarios. We evaluated the phenomena occurring in the repository that affect the DRZ and their potential effects on the extent and permeability of the DRZ. Furthermore, we examined the DRZ's role in determining the performance of the repository. Pressure in the completely sealed repository will be increased by creep closure of the salt and degradation of TRU waste contents by microbial activity in the repository. An increased pressure in the repository will reduce the extent and permeability of the DRZ. The reduced DRZ extent and permeability will decrease the amount of brine that is available to interact with the waste. Furthermore, the potential for radionuclide release from the repository is dependent on the amount of brine that enters the repository. As a result of these coupled biological-geomechanical-geochemical phenomena, the extent and permeability of the DRZ has a significant impact on the potential radionuclide releases from the repository and, in turn, the repository performance. Sandia is a multi program laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under Contract DE-AC04- 94AL85000. This research is funded by WIPP programs administered by the Office of Environmental Management (EM) of the U.S. Department of Energy.
PipelineDog: a simple and flexible graphic pipeline construction and maintenance tool.
Zhou, Anbo; Zhang, Yeting; Sun, Yazhou; Xing, Jinchuan
2018-05-01
Analysis pipelines are an essential part of bioinformatics research, and ad hoc pipelines are frequently created by researchers for prototyping and proof-of-concept purposes. However, most existing pipeline management system or workflow engines are too complex for rapid prototyping or learning the pipeline concept. A lightweight, user-friendly and flexible solution is thus desirable. In this study, we developed a new pipeline construction and maintenance tool, PipelineDog. This is a web-based integrated development environment with a modern web graphical user interface. It offers cross-platform compatibility, project management capabilities, code formatting and error checking functions and an online repository. It uses an easy-to-read/write script system that encourages code reuse. With the online repository, it also encourages sharing of pipelines, which enhances analysis reproducibility and accountability. For most users, PipelineDog requires no software installation. Overall, this web application provides a way to rapidly create and easily manage pipelines. PipelineDog web app is freely available at http://web.pipeline.dog. The command line version is available at http://www.npmjs.com/package/pipelinedog and online repository at http://repo.pipeline.dog. ysun@kean.edu or xing@biology.rutgers.edu or ysun@diagnoa.com. Supplementary data are available at Bioinformatics online.
Automated UMLS-Based Comparison of Medical Forms
Dugas, Martin; Fritz, Fleur; Krumm, Rainer; Breil, Bernhard
2013-01-01
Medical forms are very heterogeneous: on a European scale there are thousands of data items in several hundred different systems. To enable data exchange for clinical care and research purposes there is a need to develop interoperable documentation systems with harmonized forms for data capture. A prerequisite in this harmonization process is comparison of forms. So far – to our knowledge – an automated method for comparison of medical forms is not available. A form contains a list of data items with corresponding medical concepts. An automatic comparison needs data types, item names and especially item with these unique concept codes from medical terminologies. The scope of the proposed method is a comparison of these items by comparing their concept codes (coded in UMLS). Each data item is represented by item name, concept code and value domain. Two items are called identical, if item name, concept code and value domain are the same. Two items are called matching, if only concept code and value domain are the same. Two items are called similar, if their concept codes are the same, but the value domains are different. Based on these definitions an open-source implementation for automated comparison of medical forms in ODM format with UMLS-based semantic annotations was developed. It is available as package compareODM from http://cran.r-project.org. To evaluate this method, it was applied to a set of 7 real medical forms with 285 data items from a large public ODM repository with forms for different medical purposes (research, quality management, routine care). Comparison results were visualized with grid images and dendrograms. Automated comparison of semantically annotated medical forms is feasible. Dendrograms allow a view on clustered similar forms. The approach is scalable for a large set of real medical forms. PMID:23861827
3D numerical modelling of the thermal state of deep geological nuclear waste repositories
NASA Astrophysics Data System (ADS)
Butov, R. A.; Drobyshevsky, N. I.; Moiseenko, E. V.; Tokarev, Yu. N.
2017-09-01
One of the important aspects of the high-level radioactive waste (HLW) disposal in deep geological repositories is ensuring the integrity of the engineered barriers which is, among other phenomena, considerably influenced by the thermal loads. As the HLW produce significant amount of heat, the design of the repository should maintain the balance between the cost-effectiveness of the construction and the sufficiency of the safety margins, including those imposed on the thermal conditions of the barriers. The 3D finite-element computer code FENIA was developed as a tool for simulation of thermal processes in deep geological repositories. Further the models for mechanical phenomena and groundwater hydraulics will be added resulting in a fully coupled thermo-hydro-mechanical (THM) solution. The long-term simulations of the thermal state were performed for two possible layouts of the repository. One was based on the proposed project of Russian repository, and another features larger HLW amount within the same space. The obtained results describe the spatial and temporal evolution of the temperature filed inside the repository and in the surrounding rock for 3500 years. These results show that practically all generated heat was ultimately absorbed by the host rock without any significant temperature increase. Still in the short time span even in case of smaller amount of the HLW the temperature maximum exceeds 100 °C, and for larger amount of the HLW the local temperature remains above 100 °C for considerable time. Thus, the substantiation of the long-term stability of the repository would require an extensive study of the materials properties and behaviour in order to remove the excessive conservatism from the simulations and to reduce the uncertainty of the input data.
Basic repository source term and data sheet report: Lavender Canyon
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1988-01-01
This report is one of a series describing studies undertaken in support of the US Department of Energy Civilian Radioactive Waste Management (CRWM) Program. This study contains the derivation of values for environmental source terms and resources consumed for a CRWM repository. Estimates include heavy construction equipment; support equipment; shaft-sinking equipment; transportation equipment; and consumption of fuel, water, electricity, and natural gas. Data are presented for construction and operation at an assumed site in Lavender Canyon, Utah. 3 refs; 6 tabs.
dsmcFoam+: An OpenFOAM based direct simulation Monte Carlo solver
NASA Astrophysics Data System (ADS)
White, C.; Borg, M. K.; Scanlon, T. J.; Longshaw, S. M.; John, B.; Emerson, D. R.; Reese, J. M.
2018-03-01
dsmcFoam+ is a direct simulation Monte Carlo (DSMC) solver for rarefied gas dynamics, implemented within the OpenFOAM software framework, and parallelised with MPI. It is open-source and released under the GNU General Public License in a publicly available software repository that includes detailed documentation and tutorial DSMC gas flow cases. This release of the code includes many features not found in standard dsmcFoam, such as molecular vibrational and electronic energy modes, chemical reactions, and subsonic pressure boundary conditions. Since dsmcFoam+ is designed entirely within OpenFOAM's C++ object-oriented framework, it benefits from a number of key features: the code emphasises extensibility and flexibility so it is aimed first and foremost as a research tool for DSMC, allowing new models and test cases to be developed and tested rapidly. All DSMC cases are as straightforward as setting up any standard OpenFOAM case, as dsmcFoam+ relies upon the standard OpenFOAM dictionary based directory structure. This ensures that useful pre- and post-processing capabilities provided by OpenFOAM remain available even though the fully Lagrangian nature of a DSMC simulation is not typical of most OpenFOAM applications. We show that dsmcFoam+ compares well to other well-known DSMC codes and to analytical solutions in terms of benchmark results.
Towards a next generation open-source video codec
NASA Astrophysics Data System (ADS)
Bankoski, Jim; Bultje, Ronald S.; Grange, Adrian; Gu, Qunshan; Han, Jingning; Koleszar, John; Mukherjee, Debargha; Wilkins, Paul; Xu, Yaowu
2013-02-01
Google has recently been developing a next generation opensource video codec called VP9, as part of the experimental branch of the libvpx repository included in the WebM project (http://www.webmproject.org/). Starting from the VP8 video codec released by Google in 2010 as the baseline, a number of enhancements and new tools have been added to improve the coding efficiency. This paper provides a technical overview of the current status of this project along with comparisons and other stateoftheart video codecs H. 264/AVC and HEVC. The new tools that have been added so far include: larger prediction block sizes up to 64x64, various forms of compound INTER prediction, more modes for INTRA prediction, ⅛pel motion vectors and 8tap switchable subpel interpolation filters, improved motion reference generation and motion vector coding, improved entropy coding and framelevel entropy adaptation for various symbols, improved loop filtering, incorporation of Asymmetric Discrete Sine Transforms and larger 16x16 and 32x32 DCTs, frame level segmentation to group similar areas together, etc. Other tools and various bitstream features are being actively worked on as well. The VP9 bitstream is expected to be finalized by earlyto mid2013. Results show VP9 to be quite competitive in performance with mainstream stateoftheart codecs.
ReNE: A Cytoscape Plugin for Regulatory Network Enhancement
Politano, Gianfranco; Benso, Alfredo; Savino, Alessandro; Di Carlo, Stefano
2014-01-01
One of the biggest challenges in the study of biological regulatory mechanisms is the integration, americanmodeling, and analysis of the complex interactions which take place in biological networks. Despite post transcriptional regulatory elements (i.e., miRNAs) are widely investigated in current research, their usage and visualization in biological networks is very limited. Regulatory networks are commonly limited to gene entities. To integrate networks with post transcriptional regulatory data, researchers are therefore forced to manually resort to specific third party databases. In this context, we introduce ReNE, a Cytoscape 3.x plugin designed to automatically enrich a standard gene-based regulatory network with more detailed transcriptional, post transcriptional, and translational data, resulting in an enhanced network that more precisely models the actual biological regulatory mechanisms. ReNE can automatically import a network layout from the Reactome or KEGG repositories, or work with custom pathways described using a standard OWL/XML data format that the Cytoscape import procedure accepts. Moreover, ReNE allows researchers to merge multiple pathways coming from different sources. The merged network structure is normalized to guarantee a consistent and uniform description of the network nodes and edges and to enrich all integrated data with additional annotations retrieved from genome-wide databases like NCBI, thus producing a pathway fully manageable through the Cytoscape environment. The normalized network is then analyzed to include missing transcription factors, miRNAs, and proteins. The resulting enhanced network is still a fully functional Cytoscape network where each regulatory element (transcription factor, miRNA, gene, protein) and regulatory mechanism (up-regulation/down-regulation) is clearly visually identifiable, thus enabling a better visual understanding of its role and the effect in the network behavior. The enhanced network produced by ReNE is exportable in multiple formats for further analysis via third party applications. ReNE can be freely installed from the Cytoscape App Store (http://apps.cytoscape.org/apps/rene) and the full source code is freely available for download through a SVN repository accessible at http://www.sysbio.polito.it/tools_svn/BioInformatics/Rene/releases/. ReNE enhances a network by only integrating data from public repositories, without any inference or prediction. The reliability of the introduced interactions only depends on the reliability of the source data, which is out of control of ReNe developers. PMID:25541727
DOE Office of Scientific and Technical Information (OSTI.GOV)
Manteufel, R.D.; Ahola, M.P.; Turner, D.R.
A literature review has been conducted to determine the state of knowledge available in the modeling of coupled thermal (T), hydrologic (H), mechanical (M), and chemical (C) processes relevant to the design and/or performance of the proposed high-level waste (HLW) repository at Yucca Mountain, Nevada. The review focuses on identifying coupling mechanisms between individual processes and assessing their importance (i.e., if the coupling is either important, potentially important, or negligible). The significance of considering THMC-coupled processes lies in whether or not the processes impact the design and/or performance objectives of the repository. A review, such as reported here, is usefulmore » in identifying which coupled effects will be important, hence which coupled effects will need to be investigated by the US Nuclear Regulatory Commission in order to assess the assumptions, data, analyses, and conclusions in the design and performance assessment of a geologic reposit``. Although this work stems from regulatory interest in the design of the geologic repository, it should be emphasized that the repository design implicitly considers all of the repository performance objectives, including those associated with the time after permanent closure. The scope of this review is considered beyond previous assessments in that it attempts with the current state-of-knowledge) to determine which couplings are important, and identify which computer codes are currently available to model coupled processes.« less
The South Australian Department of Mines and Energy Bibliography Retrieval System.
ERIC Educational Resources Information Center
Mannik, Maire
1980-01-01
Described is the South Australian Department of Mines and Energy Bibliography Retrieval System which is a repository for a large amount of geological and related information. Instructions for retrieval are outlined, and the coding information procedures are given. (DS)
A Python library for FAIRer access and deposition to the Metabolomics Workbench Data Repository.
Smelter, Andrey; Moseley, Hunter N B
2018-01-01
The Metabolomics Workbench Data Repository is a public repository of mass spectrometry and nuclear magnetic resonance data and metadata derived from a wide variety of metabolomics studies. The data and metadata for each study is deposited, stored, and accessed via files in the domain-specific 'mwTab' flat file format. In order to improve the accessibility, reusability, and interoperability of the data and metadata stored in 'mwTab' formatted files, we implemented a Python library and package. This Python package, named 'mwtab', is a parser for the domain-specific 'mwTab' flat file format, which provides facilities for reading, accessing, and writing 'mwTab' formatted files. Furthermore, the package provides facilities to validate both the format and required metadata elements of a given 'mwTab' formatted file. In order to develop the 'mwtab' package we used the official 'mwTab' format specification. We used Git version control along with Python unit-testing framework as well as continuous integration service to run those tests on multiple versions of Python. Package documentation was developed using sphinx documentation generator. The 'mwtab' package provides both Python programmatic library interfaces and command-line interfaces for reading, writing, and validating 'mwTab' formatted files. Data and associated metadata are stored within Python dictionary- and list-based data structures, enabling straightforward, 'pythonic' access and manipulation of data and metadata. Also, the package provides facilities to convert 'mwTab' files into a JSON formatted equivalent, enabling easy reusability of the data by all modern programming languages that implement JSON parsers. The 'mwtab' package implements its metadata validation functionality based on a pre-defined JSON schema that can be easily specialized for specific types of metabolomics studies. The library also provides a command-line interface for interconversion between 'mwTab' and JSONized formats in raw text and a variety of compressed binary file formats. The 'mwtab' package is an easy-to-use Python package that provides FAIRer utilization of the Metabolomics Workbench Data Repository. The source code is freely available on GitHub and via the Python Package Index. Documentation includes a 'User Guide', 'Tutorial', and 'API Reference'. The GitHub repository also provides 'mwtab' package unit-tests via a continuous integration service.
Semantic framework for mapping object-oriented model to semantic web languages
Ježek, Petr; Mouček, Roman
2015-01-01
The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework. PMID:25762923
Semantic framework for mapping object-oriented model to semantic web languages.
Ježek, Petr; Mouček, Roman
2015-01-01
The article deals with and discusses two main approaches in building semantic structures for electrophysiological metadata. It is the use of conventional data structures, repositories, and programming languages on one hand and the use of formal representations of ontologies, known from knowledge representation, such as description logics or semantic web languages on the other hand. Although knowledge engineering offers languages supporting richer semantic means of expression and technological advanced approaches, conventional data structures and repositories are still popular among developers, administrators and users because of their simplicity, overall intelligibility, and lower demands on technical equipment. The choice of conventional data resources and repositories, however, raises the question of how and where to add semantics that cannot be naturally expressed using them. As one of the possible solutions, this semantics can be added into the structures of the programming language that accesses and processes the underlying data. To support this idea we introduced a software prototype that enables its users to add semantically richer expressions into a Java object-oriented code. This approach does not burden users with additional demands on programming environment since reflective Java annotations were used as an entry for these expressions. Moreover, additional semantics need not to be written by the programmer directly to the code, but it can be collected from non-programmers using a graphic user interface. The mapping that allows the transformation of the semantically enriched Java code into the Semantic Web language OWL was proposed and implemented in a library named the Semantic Framework. This approach was validated by the integration of the Semantic Framework in the EEG/ERP Portal and by the subsequent registration of the EEG/ERP Portal in the Neuroscience Information Framework.
FRAMES Metadata Reporting Templates for Ecohydrological Observations, version 1.1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Christianson, Danielle; Varadharajan, Charuleka; Christoffersen, Brad
FRAMES is a a set of Excel metadata files and package-level descriptive metadata that are designed to facilitate and improve capture of desired metadata for ecohydrological observations. The metadata are bundled with data files into a data package and submitted to a data repository (e.g. the NGEE Tropics Data Repository) via a web form. FRAMES standardizes reporting of diverse ecohydrological and biogeochemical data for synthesis across a range of spatiotemporal scales and incorporates many best data science practices. This version of FRAMES supports observations for primarily automated measurements collected by permanently located sensors, including sap flow (tree water use), leafmore » surface temperature, soil water content, dendrometry (stem diameter growth increment), and solar radiation. Version 1.1 extend the controlled vocabulary and incorporates functionality to facilitate programmatic use of data and FRAMES metadata (R code available at NGEE Tropics Data Repository).« less
Disposal of disused sealed radiation sources in Boreholes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vicente, R.
2007-07-01
This paper gives a description of the concept of a geological repository for disposal of disused sealed radiation sources (DSRS) under development in the Institute of Energy and Nuclear Research (IPEN), in Brazil. DSRS represent a significant fraction of total activity of radioactive wastes to be managed. Most DSRS are collected and temporarily stored at IPEN. As of 2006, the total collected activity is 800 TBq in 7,508 industrial gauge or radiotherapy sources, 7.2 TBq in about 72,000 Americium-241 sources detached from lightning rods, and about 0,5 GBq in 20,857 sources from smoke detectors. The estimated inventory of sealed sourcesmore » in the country is 2.7 hundred thousand sources with 26 PBq. The proposed repository is designed to receive the total inventory of sealed sources. A description of the pre-disposal facilities at IPEN is also presented. (authors)« less
Tutorial: Measuring Stellar Atmospheric Parameters with ARES+MOOG
NASA Astrophysics Data System (ADS)
Sousa, Sérgio G.; Andreasen, Daniel T.
The technical aspects of using an Equivalent Width (EW) method for the derivation of spectroscopic stellar parameters with ares+ moog are described herein. While the science background to this method can be found in numerous references, the goal here is to provide a user-friendly guide to the several codes and scripts used in the tutorial presented at the School. All the required data have been made available online at the following repository: https://github.com/sousasag/school_codes.
DSpace and customized controlled vocabularies
NASA Astrophysics Data System (ADS)
Skourlas, C.; Tsolakidis, A.; Kakoulidis, P.; Giannakopoulos, G.
2015-02-01
The open source platform of DSpace could be defined as a repository application used to provide access to digital resources. DSpace is installed and used by more than 1000 organizations worldwide. A predefined taxonomy of keyword, called the Controlled Vocabulary, can be used for describing and accessing the information items stored in the repository. In this paper, we describe how the users can create, and customize their own vocabularies. Various heterogeneous items, such as research papers, videos, articles and educational material of the repository, can be indexed in order to provide advanced search functionality using new controlled vocabularies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
St. John, C.M.
1977-04-01
An underground repository containing heat generating, High Level Waste or Spent Unreprocessed Fuel may be approximated as a finite number of heat sources distributed across the plane of the repository. The resulting temperature, displacement and stress changes may be calculated using analytical solutions, providing linear thermoelasticity is assumed. This report documents a computer program based on this approach and gives results that form the basis for a comparison between the effects of disposing of High Level Waste and Spent Unreprocessed Fuel.
Zhou, Carol L Ecale
2015-01-01
In order to better define regions of similarity among related protein structures, it is useful to identify the residue-residue correspondences among proteins. Few codes exist for constructing a one-to-many multiple sequence alignment derived from a set of structure or sequence alignments, and a need was evident for creating such a tool for combining pairwise structure alignments that would allow for insertion of gaps in the reference structure. This report describes a new Python code, CombAlign, which takes as input a set of pairwise sequence alignments (which may be structure based) and generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA). The use and utility of CombAlign was demonstrated by generating gapped MSSAs using sets of pairwise structure-based sequence alignments between structure models of the matrix protein (VP40) and pre-small/secreted glycoprotein (sGP) of Reston Ebolavirus and the corresponding proteins of several other filoviruses. The gapped MSSAs revealed structure-based residue-residue correspondences, which enabled identification of structurally similar versus differing regions in the Reston proteins compared to each of the other corresponding proteins. CombAlign is a new Python code that generates a one-to-many, gapped, multiple structure- or sequence-based sequence alignment (MSSA) given a set of pairwise sequence alignments (which may be structure based). CombAlign has utility in assisting the user in distinguishing structurally conserved versus divergent regions on a reference protein structure relative to other closely related proteins. CombAlign was developed in Python 2.6, and the source code is available for download from the GitHub code repository.
Chargemaster maintenance: think 'spring cleaning' all year round.
Barton, Shawn; Lancaster, Dani; Bieker, Mike
2008-11-01
Steps toward maintaining a standardized chargemaster include: Building a corporate chargemaster maintenance team. Developing a core research function. Designating hospital liaisons. Publishing timely reports on facility compliance. Using system codes to identify charges. Selecting chargemaster maintenance software. Developing a standard chargemaster data repository. Educating staff.
Modeling Potential Tephra Dispersal at Yucca Mountain, Nevada
NASA Astrophysics Data System (ADS)
Hooper, D.; Franklin, N.; Adams, N.; Basu, D.
2006-12-01
Quaternary basaltic volcanoes exist within 20 km [12 mi] of the potential radioactive waste repository at Yucca Mountain, Nevada, and future basaltic volcanism at the repository is considered a low-probability, potentially high-consequence event. If radioactive waste was entrained in the conduit of a future volcanic event, tephra and waste could be transported in the resulting eruption plume. During an eruption, basaltic tephra would be dispersed primarily according to the height of the eruption column, particle-size distribution, and structure of the winds aloft. Following an eruption, contaminated tephra-fall deposits would be affected by surface redistribution processes. The Center for Nuclear Waste Regulatory Analyses developed the computer code TEPHRA to calculate atmospheric dispersion and subsequent deposition of tephra and spent nuclear fuel from a potential eruption at Yucca Mountain and to help prepare the U.S. Nuclear Regulatory Commission to review a potential U.S. Department of Energy license application. The TEPHRA transport code uses the Suzuki model to simulate the thermo-fluid dynamics of atmospheric tephra dispersion. TEPHRA models the transport of airborne pyroclasts based on particle diffusion from an eruption column, horizontal diffusion of particles by atmospheric and plume turbulence, horizontal advection by atmospheric circulation, and particle settling by gravity. More recently, TEPHRA was modified to calculate potential tephra deposit distributions using stratified wind fields based on upper atmosphere data from the Nevada Test Site. Wind data are binned into 1-km [0.62-mi]-high intervals with coupled distributions of wind speed and direction produced for each interval. Using this stratified wind field and discretization with respect to height, TEPHRA calculates particle fall and lateral displacement for each interval. This implementation permits modeling of split wind fields. We use a parallel version of the code to calculate expected tephra and high-level waste accumulation at specified points on a two-dimensional spatial grid, thereby simulating a three- dimensional initial deposit. To assess subsequent tephra and high-level waste redistribution and resuspension, modeling grids were devised to measure deposition in eolian and fluvial source regions. The eolian grid covers an area of 2,600 km2 [1,000 mi2] and the fluvial grid encompasses 318 km2 [123 mi2] of the southernmost portion of the Fortymile Wash catchment basin. Because each realization is independent, distributions of tephra and high-level waste reflect anticipated variations in source-term and transport characteristics. This abstract is an independent product of the Center for Nuclear Waste Regulatory Analyses and does not necessarily reflect the view or regulatory position of the U.S. Nuclear Regulatory Commission.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, Joon H.; Siegel, Malcolm Dean; Arguello, Jose Guadalupe, Jr.
2011-03-01
This report describes a gap analysis performed in the process of developing the Waste Integrated Performance and Safety Codes (IPSC) in support of the U.S. Department of Energy (DOE) Office of Nuclear Energy Advanced Modeling and Simulation (NEAMS) Campaign. The goal of the Waste IPSC is to develop an integrated suite of computational modeling and simulation capabilities to quantitatively assess the long-term performance of waste forms in the engineered and geologic environments of a radioactive waste storage or disposal system. The Waste IPSC will provide this simulation capability (1) for a range of disposal concepts, waste form types, engineered repositorymore » designs, and geologic settings, (2) for a range of time scales and distances, (3) with appropriate consideration of the inherent uncertainties, and (4) in accordance with rigorous verification, validation, and software quality requirements. The gap analyses documented in this report were are performed during an initial gap analysis to identify candidate codes and tools to support the development and integration of the Waste IPSC, and during follow-on activities that delved into more detailed assessments of the various codes that were acquired, studied, and tested. The current Waste IPSC strategy is to acquire and integrate the necessary Waste IPSC capabilities wherever feasible, and develop only those capabilities that cannot be acquired or suitably integrated, verified, or validated. The gap analysis indicates that significant capabilities may already exist in the existing THC codes although there is no single code able to fully account for all physical and chemical processes involved in a waste disposal system. Large gaps exist in modeling chemical processes and their couplings with other processes. The coupling of chemical processes with flow transport and mechanical deformation remains challenging. The data for extreme environments (e.g., for elevated temperature and high ionic strength media) that are needed for repository modeling are severely lacking. In addition, most of existing reactive transport codes were developed for non-radioactive contaminants, and they need to be adapted to account for radionuclide decay and in-growth. The accessibility to the source codes is generally limited. Because the problems of interest for the Waste IPSC are likely to result in relatively large computational models, a compact memory-usage footprint and a fast/robust solution procedure will be needed. A robust massively parallel processing (MPP) capability will also be required to provide reasonable turnaround times on the analyses that will be performed with the code. A performance assessment (PA) calculation for a waste disposal system generally requires a large number (hundreds to thousands) of model simulations to quantify the effect of model parameter uncertainties on the predicted repository performance. A set of codes for a PA calculation must be sufficiently robust and fast in terms of code execution. A PA system as a whole must be able to provide multiple alternative models for a specific set of physical/chemical processes, so that the users can choose various levels of modeling complexity based on their modeling needs. This requires PA codes, preferably, to be highly modularized. Most of the existing codes have difficulties meeting these requirements. Based on the gap analysis results, we have made the following recommendations for the code selection and code development for the NEAMS waste IPSC: (1) build fully coupled high-fidelity THCMBR codes using the existing SIERRA codes (e.g., ARIA and ADAGIO) and platform, (2) use DAKOTA to build an enhanced performance assessment system (EPAS), and build a modular code architecture and key code modules for performance assessments. The key chemical calculation modules will be built by expanding the existing CANTERA capabilities as well as by extracting useful components from other existing codes.« less
MODELING OF THE GROUNDWATER TRANSPORT AROUND A DEEP BOREHOLE NUCLEAR WASTE REPOSITORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
N. Lubchenko; M. Rodríguez-Buño; E.A. Bates
2015-04-01
The concept of disposal of high-level nuclear waste in deep boreholes drilled into crystalline bedrock is gaining renewed interest and consideration as a viable mined repository alternative. A large amount of work on conceptual borehole design and preliminary performance assessment has been performed by researchers at MIT, Sandia National Laboratories, SKB (Sweden), and others. Much of this work relied on analytical derivations or, in a few cases, on weakly coupled models of heat, water, and radionuclide transport in the rock. Detailed numerical models are necessary to account for the large heterogeneity of properties (e.g., permeability and salinity vs. depth, diffusionmore » coefficients, etc.) that would be observed at potential borehole disposal sites. A derivation of the FALCON code (Fracturing And Liquid CONvection) was used for the thermal-hydrologic modeling. This code solves the transport equations in porous media in a fully coupled way. The application leverages the flexibility and strengths of the MOOSE framework, developed by Idaho National Laboratory. The current version simulates heat, fluid, and chemical species transport in a fully coupled way allowing the rigorous evaluation of candidate repository site performance. This paper mostly focuses on the modeling of a deep borehole repository under realistic conditions, including modeling of a finite array of boreholes surrounded by undisturbed rock. The decay heat generated by the canisters diffuses into the host rock. Water heating can potentially lead to convection on the scale of thousands of years after the emplacement of the fuel. This convection is tightly coupled to the transport of the dissolved salt, which can suppress convection and reduce the release of the radioactive materials to the aquifer. The purpose of this work has been to evaluate the importance of the borehole array spacing and find the conditions under which convective transport can be ruled out as a radionuclide transport mechanism. Preliminary results show that modeling of the borehole array, including the surrounding rock, predicts convective flow in the system with physical velocities of the order of 10-5 km/yr over 105 years. This results in an escape length on the order of kilometers, which is comparable to the repository depth. However, a correct account of the salinity effects reduces convection velocity and escape length of the radionuclides from the repository.« less
NASA Astrophysics Data System (ADS)
Topping, David; Barley, Mark; Bane, Michael K.; Higham, Nicholas; Aumont, Bernard; Dingle, Nicholas; McFiggans, Gordon
2016-03-01
In this paper we describe the development and application of a new web-based facility, UManSysProp (http://umansysprop.seaes.manchester.ac.uk), for automating predictions of molecular and atmospheric aerosol properties. Current facilities include pure component vapour pressures, critical properties, and sub-cooled densities of organic molecules; activity coefficient predictions for mixed inorganic-organic liquid systems; hygroscopic growth factors and CCN (cloud condensation nuclei) activation potential of mixed inorganic-organic aerosol particles; and absorptive partitioning calculations with/without a treatment of non-ideality. The aim of this new facility is to provide a single point of reference for all properties relevant to atmospheric aerosol that have been checked for applicability to atmospheric compounds where possible. The group contribution approach allows users to upload molecular information in the form of SMILES (Simplified Molecular Input Line Entry System) strings and UManSysProp will automatically extract the relevant information for calculations. Built using open-source chemical informatics, and hosted at the University of Manchester, the facilities are provided via a browser and device-friendly web interface, or can be accessed using the user's own code via a JSON API (application program interface). We also provide the source code for all predictive techniques provided on the site, covered by the GNU GPL (General Public License) license to encourage development of a user community. We have released this via a Github repository (doi:10.5281/zenodo.45143). In this paper we demonstrate its use with specific examples that can be simulated using the web-browser interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huff, Kathryn D.
Component level and system level abstraction of detailed computational geologic repository models have resulted in four rapid computational models of hydrologic radionuclide transport at varying levels of detail. Those models are described, as is their implementation in Cyder, a software library of interchangeable radionuclide transport models appropriate for representing natural and engineered barrier components of generic geology repository concepts. A proof of principle demonstration was also conducted in which these models were used to represent the natural and engineered barrier components of a repository concept in a reducing, homogenous, generic geology. This base case demonstrates integration of the Cyder openmore » source library with the Cyclus computational fuel cycle systems analysis platform to facilitate calculation of repository performance metrics with respect to fuel cycle choices. (authors)« less
Lazarou, Stavros; Vita, Vasiliki; Ekonomou, Lambros
2018-02-01
The data of this article represent a real electricity distribution network on twenty kilovolts (20 kV) at medium voltage level of the Hellenic electricity distribution system [1]. This network has been chosen as suitable for smart grid analysis. It demonstrates moderate penetration of renewable sources and it has capability in part of time for reverse power flows. It is suitable for studies of load aggregation, storage, demand response. It represents a rural line of fifty-five kilometres (55 km) total length, a typical length for this type. It serves forty-five (45) medium to low voltage transformers and twenty-four (24) connections to photovoltaic plants. The total installed load capacity is twelve mega-volt-ampere (12 MVA), however the maximum observed load is lower. The data are ready to perform load flow simulation on Matpower [2] for the maximum observed load power on the half production for renewables. The simulation results and processed data for creating the source code are also provided on the database available at http://dx.doi.org/10.7910/DVN/1I6MKU.
phylo-node: A molecular phylogenetic toolkit using Node.js.
O'Halloran, Damien M
2017-01-01
Node.js is an open-source and cross-platform environment that provides a JavaScript codebase for back-end server-side applications. JavaScript has been used to develop very fast and user-friendly front-end tools for bioinformatic and phylogenetic analyses. However, no such toolkits are available using Node.js to conduct comprehensive molecular phylogenetic analysis. To address this problem, I have developed, phylo-node, which was developed using Node.js and provides a stable and scalable toolkit that allows the user to perform diverse molecular and phylogenetic tasks. phylo-node can execute the analysis and process the resulting outputs from a suite of software options that provides tools for read processing and genome alignment, sequence retrieval, multiple sequence alignment, primer design, evolutionary modeling, and phylogeny reconstruction. Furthermore, phylo-node enables the user to deploy server dependent applications, and also provides simple integration and interoperation with other Node modules and languages using Node inheritance patterns, and a customized piping module to support the production of diverse pipelines. phylo-node is open-source and freely available to all users without sign-up or login requirements. All source code and user guidelines are openly available at the GitHub repository: https://github.com/dohalloran/phylo-node.
The Community as a Source of Pragmatic Input for Learners of Italian: The Multimedia Repository LIRA
ERIC Educational Resources Information Center
Zanoni, Greta
2016-01-01
This paper focuses on community participation within the LIRA project--Lingua/Cultura Italiana in Rete per l'Apprendimento (Italian language and culture for online learning). LIRA is a multimedia repository of e-learning materials aiming at recovering, preserving and developing the linguistic, pragmatic and cultural competences of second and third…
Use of Digital Repositories by Chemistry Researchers: Results of a Survey
ERIC Educational Resources Information Center
Polydoratou, Panayiota
2007-01-01
Purpose: This paper aims to present findings from a survey that aimed to identify the issues around the use and linkage of source and output repositories and the chemistry researchers' expectations about their use. Design/methodology/approach: This survey was performed by means of an online questionnaire and structured interviews with academic and…
10 CFR 2.1003 - Availability of material.
Code of Federal Regulations, 2011 CFR
2011-01-01
... months in advance of submitting its license application for a geologic repository, the NRC shall make... of privilege in § 2.1006, graphic-oriented documentary material that includes raw data, computer runs, computer programs and codes, field notes, laboratory notes, maps, diagrams and photographs, which have been...
10 CFR 2.1003 - Availability of material.
Code of Federal Regulations, 2012 CFR
2012-01-01
... months in advance of submitting its license application for a geologic repository, the NRC shall make... of privilege in § 2.1006, graphic-oriented documentary material that includes raw data, computer runs, computer programs and codes, field notes, laboratory notes, maps, diagrams and photographs, which have been...
The discounting model selector: Statistical software for delay discounting applications.
Gilroy, Shawn P; Franck, Christopher T; Hantula, Donald A
2017-05-01
Original, open-source computer software was developed and validated against established delay discounting methods in the literature. The software executed approximate Bayesian model selection methods from user-supplied temporal discounting data and computed the effective delay 50 (ED50) from the best performing model. Software was custom-designed to enable behavior analysts to conveniently apply recent statistical methods to temporal discounting data with the aid of a graphical user interface (GUI). The results of independent validation of the approximate Bayesian model selection methods indicated that the program provided results identical to that of the original source paper and its methods. Monte Carlo simulation (n = 50,000) confirmed that true model was selected most often in each setting. Simulation code and data for this study were posted to an online repository for use by other researchers. The model selection approach was applied to three existing delay discounting data sets from the literature in addition to the data from the source paper. Comparisons of model selected ED50 were consistent with traditional indices of discounting. Conceptual issues related to the development and use of computer software by behavior analysts and the opportunities afforded by free and open-sourced software are discussed and a review of possible expansions of this software are provided. © 2017 Society for the Experimental Analysis of Behavior.
DataUp: Helping manage and archive data within the researcher's workflow
NASA Astrophysics Data System (ADS)
Strasser, C.
2012-12-01
There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are lacks of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. We have developed an open-source add-in for Excel and an open source web application intended to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. The researcher does not need a prior relationship with a data repository to use DataUp; the newly implemented ONEShare repository, a DataONE member node, is available for any researcher to archive and share their data. By meeting researchers where they already work, in spreadsheets, DataUp becomes part of the researcher's workflow and data management and sharing becomes easier. Future enhancement of DataUp will rely on members of the community adopting and adapting the DataUp tools to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between Microsoft Research Connections, the University of California's California Digital Library, the Gordon and Betty Moore Foundation, and DataONE.
FracPaQ: A MATLAB™ toolbox for the quantification of fracture patterns
NASA Astrophysics Data System (ADS)
Healy, David; Rizzo, Roberto E.; Cornwell, David G.; Farrell, Natalie J. C.; Watkins, Hannah; Timms, Nick E.; Gomez-Rivas, Enrique; Smith, Michael
2017-02-01
The patterns of fractures in deformed rocks are rarely uniform or random. Fracture orientations, sizes, and spatial distributions often exhibit some kind of order. In detail, relationships may exist among the different fracture attributes, e.g. small fractures dominated by one orientation, larger fractures by another. These relationships are important because the mechanical (e.g. strength, anisotropy) and transport (e.g. fluids, heat) properties of rock depend on these fracture attributes and patterns. This paper describes FracPaQ, a new open source, cross-platform toolbox to quantify fracture patterns, including distributions in fracture attributes and their spatial variation. Software has been developed to quantify fracture patterns from 2-D digital images, such as thin section micrographs, geological maps, outcrop or aerial photographs or satellite images. The toolbox comprises a suite of MATLAB™ scripts based on previously published quantitative methods for the analysis of fracture attributes: orientations, lengths, intensity, density and connectivity. An estimate of permeability in 2-D is made using a parallel plate model. The software provides an objective and consistent methodology for quantifying fracture patterns and their variations in 2-D across a wide range of length scales, rock types and tectonic settings. The implemented methods presented are inherently scale independent, and a key task where applicable is analysing and integrating quantitative fracture pattern data from micro-to macro-scales. The toolbox was developed in MATLAB™ and the source code is publicly available on GitHub™ and the Mathworks™ FileExchange. The code runs on any computer with MATLAB installed, including PCs with Microsoft Windows, Apple Macs with Mac OS X, and machines running different flavours of Linux. The application, source code and sample input files are available in open repositories in the hope that other developers and researchers will optimise and extend the functionality for the benefit of the wider community.
Oceanotron, Scalable Server for Marine Observations
NASA Astrophysics Data System (ADS)
Loubrieu, T.; Bregent, S.; Blower, J. D.; Griffiths, G.
2013-12-01
Ifremer, French marine institute, is deeply involved in data management for different ocean in-situ observation programs (ARGO, OceanSites, GOSUD, ...) or other European programs aiming at networking ocean in-situ observation data repositories (myOcean, seaDataNet, Emodnet). To capitalize the effort for implementing advance data dissemination services (visualization, download with subsetting) for these programs and generally speaking water-column observations repositories, Ifremer decided to develop the oceanotron server (2010). Knowing the diversity of data repository formats (RDBMS, netCDF, ODV, ...) and the temperamental nature of the standard interoperability interface profiles (OGC/WMS, OGC/WFS, OGC/SOS, OpeNDAP, ...), the server is designed to manage plugins: - StorageUnits : which enable to read specific data repository formats (netCDF/OceanSites, RDBMS schema, ODV binary format). - FrontDesks : which get external requests and send results for interoperable protocols (OGC/WMS, OGC/SOS, OpenDAP). In between a third type of plugin may be inserted: - TransformationUnits : which enable ocean business related transformation of the features (for example conversion of vertical coordinates from pressure in dB to meters under sea surface). The server is released under open-source license so that partners can develop their own plugins. Within MyOcean project, University of Reading has plugged a WMS implementation as an oceanotron frontdesk. The modules are connected together by sharing the same information model for marine observations (or sampling features: vertical profiles, point series and trajectories), dataset metadata and queries. The shared information model is based on OGC/Observation & Measurement and Unidata/Common Data Model initiatives. The model is implemented in java (http://www.ifremer.fr/isi/oceanotron/javadoc/). This inner-interoperability level enables to capitalize ocean business expertise in software development without being indentured to specific data formats or protocols. Oceanotron is deployed at seven European data centres for marine in-situ observations within myOcean. While additional extensions are still being developed, to promote new collaborative initiatives, a work is now done on continuous and distributed integration (jenkins, maven), shared reference documentation (on alfresco) and code and release dissemination (sourceforge, github).
pyhector: A Python interface for the simple climate model Hector
DOE Office of Scientific and Technical Information (OSTI.GOV)
N Willner, Sven; Hartin, Corinne; Gieseke, Robert
2017-04-01
Pyhector is a Python interface for the simple climate model Hector (Hartin et al. 2015) developed in C++. Simple climate models like Hector can, for instance, be used in the analysis of scenarios within integrated assessment models like GCAM1, in the emulation of complex climate models, and in uncertainty analyses. Hector is an open-source, object oriented, simple global climate carbon cycle model. Its carbon cycle consists of a one pool atmosphere, three terrestrial pools which can be broken down into finer biomes or regions, and four carbon pools in the ocean component. The terrestrial carbon cycle includes primary production andmore » respiration fluxes. The ocean carbon cycle circulates carbon via a simplified thermohaline circulation, calculating air-sea fluxes as well as the marine carbonate system (Hartin et al. 2016). The model input is time series of greenhouse gas emissions; as example scenarios for these the Pyhector package contains the Representative Concentration Pathways (RCPs)2. These were developed to cover the range of baseline and mitigation emissions scenarios and are widely used in climate change research and model intercomparison projects. Using DataFrames from the Python library Pandas (McKinney 2010) as a data structure for the scenarios simplifies generating and adapting scenarios. Other parameters of the Hector model can easily be modified when running the model. Pyhector can be installed using pip from the Python Package Index.3 Source code and issue tracker are available in Pyhector's GitHub repository4. Documentation is provided through Readthedocs5. Usage examples are also contained in the repository as a Jupyter Notebook (Pérez and Granger 2007; Kluyver et al. 2016). Courtesy of the Mybinder project6, the example Notebook can also be executed and modified without installing Pyhector locally.« less
Taking advantage of continuity of care documents to populate a research repository.
Klann, Jeffrey G; Mendis, Michael; Phillips, Lori C; Goodson, Alyssa P; Rocha, Beatriz H; Goldberg, Howard S; Wattanasin, Nich; Murphy, Shawn N
2015-03-01
Clinical data warehouses have accelerated clinical research, but even with available open source tools, there is a high barrier to entry due to the complexity of normalizing and importing data. The Office of the National Coordinator for Health Information Technology's Meaningful Use Incentive Program now requires that electronic health record systems produce standardized consolidated clinical document architecture (C-CDA) documents. Here, we leverage this data source to create a low volume standards based import pipeline for the Informatics for Integrating Biology and the Bedside (i2b2) clinical research platform. We validate this approach by creating a small repository at Partners Healthcare automatically from C-CDA documents. We designed an i2b2 extension to import C-CDAs into i2b2. It is extensible to other sites with variances in C-CDA format without requiring custom code. We also designed new ontology structures for querying the imported data. We implemented our methodology at Partners Healthcare, where we developed an adapter to retrieve C-CDAs from Enterprise Services. Our current implementation supports demographics, encounters, problems, and medications. We imported approximately 17 000 clinical observations on 145 patients into i2b2 in about 24 min. We were able to perform i2b2 cohort finding queries and view patient information through SMART apps on the imported data. This low volume import approach can serve small practices with local access to C-CDAs and will allow patient registries to import patient supplied C-CDAs. These components will soon be available open source on the i2b2 wiki. Our approach will lower barriers to entry in implementing i2b2 where informatics expertise or data access are limited. © The Author 2014. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Seismic Canvas: Evolution as a Data Exploration and Analysis Tool
NASA Astrophysics Data System (ADS)
Kroeger, G. C.
2015-12-01
SeismicCanvas, originally developed as a prototype interactive waveform display and printing application for educational use has evolved to include significant data exploration and analysis functionality. The most recent version supports data import from a variety of standard file formats including SAC and mini-SEED, as well as search and download capabilities via IRIS/FDSN Web Services. Data processing tools now include removal of means and trends, interactive windowing, filtering, smoothing, tapering, resampling. Waveforms can be displayed in a free-form canvas or as a record section based on angular or great circle distance, azimuth or back azimuth. Integrated tau-p code allows the calculation and display of theoretical phase arrivals from a variety of radial Earth models. Waveforms can be aligned by absolute time, event time, picked or theoretical arrival times and can be stacked after alignment. Interactive measurements include means, amplitudes, time delays, ray parameters and apparent velocities. Interactive picking of an arbitrary list of seismic phases is supported. Bode plots of amplitude and phase spectra and spectrograms can be created from multiple seismograms or selected windows of seismograms. Direct printing is implemented on all supported platforms along with output of high-resolution pdf files. With these added capabilities, the application is now being used as a data exploration tool for research. Coded in C++ and using the cross-platform Qt framework, the most recent version is available as a 64-bit application for Windows 7-10, Mac OS X 10.6-10.11, and most distributions of Linux, and a 32-bit version for Windows XP and 7. With the latest improvements and refactoring of trace display classes, the 64-bit versions have been tested with over 250 million samples and remain responsive in interactive operations. The source code is available under a LPGLv3 license and both source and executables are available through the IRIS SeisCode repository.
78 FR 20332 - Changes in Flood Hazard Determinations
Federal Register 2010, 2011, 2012, 2013, 2014
2013-04-04
...), as shown on the Flood Insurance Rate Maps (FIRMs), and where applicable, in the supporting Flood... a Letter of Map Revision (LOMR), in accordance with Title 44, Part 65 of the Code of Federal... inspection at both the online location and the respective community map repository address listed in the...
Using Semantic Templates to Study Vulnerabilities Recorded in Large Software Repositories
ERIC Educational Resources Information Center
Wu, Yan
2011-01-01
Software vulnerabilities allow an attacker to reduce a system's Confidentiality, Availability, and Integrity by exposing information, executing malicious code, and undermine system functionalities that contribute to the overall system purpose and need. With new vulnerabilities discovered everyday in a variety of applications and user environments,…
Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S
2014-01-01
Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu.
Pennington, Jeffrey W; Ruth, Byron; Italia, Michael J; Miller, Jeffrey; Wrazien, Stacey; Loutrel, Jennifer G; Crenshaw, E Bryan; White, Peter S
2014-01-01
Biomedical researchers share a common challenge of making complex data understandable and accessible as they seek inherent relationships between attributes in disparate data types. Data discovery in this context is limited by a lack of query systems that efficiently show relationships between individual variables, but without the need to navigate underlying data models. We have addressed this need by developing Harvest, an open-source framework of modular components, and using it for the rapid development and deployment of custom data discovery software applications. Harvest incorporates visualizations of highly dimensional data in a web-based interface that promotes rapid exploration and export of any type of biomedical information, without exposing researchers to underlying data models. We evaluated Harvest with two cases: clinical data from pediatric cardiology and demonstration data from the OpenMRS project. Harvest's architecture and public open-source code offer a set of rapid application development tools to build data discovery applications for domain-specific biomedical data repositories. All resources, including the OpenMRS demonstration, can be found at http://harvest.research.chop.edu PMID:24131510
NASA Astrophysics Data System (ADS)
Mattie, P. D.; Knowlton, R. G.; Arnold, B. W.; Tien, N.; Kuo, M.
2006-12-01
Sandia National Laboratories (Sandia), a U.S. Department of Energy National Laboratory, has over 30 years experience in radioactive waste disposal and is providing assistance internationally in a number of areas relevant to the safety assessment of radioactive waste disposal systems. International technology transfer efforts are often hampered by small budgets, time schedule constraints, and a lack of experienced personnel in countries with small radioactive waste disposal programs. In an effort to surmount these difficulties, Sandia has developed a system that utilizes a combination of commercially available codes and existing legacy codes for probabilistic safety assessment modeling that facilitates the technology transfer and maximizes limited available funding. Numerous codes developed and endorsed by the United States Nuclear Regulatory Commission and codes developed and maintained by United States Department of Energy are generally available to foreign countries after addressing import/export control and copyright requirements. From a programmatic view, it is easier to utilize existing codes than to develop new codes. From an economic perspective, it is not possible for most countries with small radioactive waste disposal programs to maintain complex software, which meets the rigors of both domestic regulatory requirements and international peer review. Therefore, re-vitalization of deterministic legacy codes, as well as an adaptation of contemporary deterministic codes, provides a creditable and solid computational platform for constructing probabilistic safety assessment models. External model linkage capabilities in Goldsim and the techniques applied to facilitate this process will be presented using example applications, including Breach, Leach, and Transport-Multiple Species (BLT-MS), a U.S. NRC sponsored code simulating release and transport of contaminants from a subsurface low-level waste disposal facility used in a cooperative technology transfer project between Sandia National Laboratories and Taiwan's Institute of Nuclear Energy Research (INER) for the preliminary assessment of several candidate low-level waste repository sites. Sandia National Laboratories is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy under Contract DE AC04 94AL85000.
ERIC Educational Resources Information Center
Sutradhar, B.
2006-01-01
Purpose: To describe how an institutional repository (IR) was set up, using open source software, at the Indian Institute of Technology (IIT) in Kharagpur. Members of the IIT can publish their research documents in the IR for online access as well as digital preservation. Material in this IR includes instructional materials, records, data sets,…
JEnsembl: a version-aware Java API to Ensembl data systems.
Paterson, Trevor; Law, Andy
2012-11-01
The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing 'through time' comparative analyses to be performed. Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net).
Revision history aware repositories of computational models of biological systems.
Miller, Andrew K; Yu, Tommy; Britten, Randall; Cooling, Mike T; Lawson, James; Cowan, Dougal; Garny, Alan; Halstead, Matt D B; Hunter, Peter J; Nickerson, David P; Nunns, Geo; Wimalaratne, Sarala M; Nielsen, Poul M F
2011-01-14
Building repositories of computational models of biological systems ensures that published models are available for both education and further research, and can provide a source of smaller, previously verified models to integrate into a larger model. One problem with earlier repositories has been the limitations in facilities to record the revision history of models. Often, these facilities are limited to a linear series of versions which were deposited in the repository. This is problematic for several reasons. Firstly, there are many instances in the history of biological systems modelling where an 'ancestral' model is modified by different groups to create many different models. With a linear series of versions, if the changes made to one model are merged into another model, the merge appears as a single item in the history. This hides useful revision history information, and also makes further merges much more difficult, as there is no record of which changes have or have not already been merged. In addition, a long series of individual changes made outside of the repository are also all merged into a single revision when they are put back into the repository, making it difficult to separate out individual changes. Furthermore, many earlier repositories only retain the revision history of individual files, rather than of a group of files. This is an important limitation to overcome, because some types of models, such as CellML 1.1 models, can be developed as a collection of modules, each in a separate file. The need for revision history is widely recognised for computer software, and a lot of work has gone into developing version control systems and distributed version control systems (DVCSs) for tracking the revision history. However, to date, there has been no published research on how DVCSs can be applied to repositories of computational models of biological systems. We have extended the Physiome Model Repository software to be fully revision history aware, by building it on top of Mercurial, an existing DVCS. We have demonstrated the utility of this approach, when used in conjunction with the model composition facilities in CellML, to build and understand more complex models. We have also demonstrated the ability of the repository software to present version history to casual users over the web, and to highlight specific versions which are likely to be useful to users. Providing facilities for maintaining and using revision history information is an important part of building a useful repository of computational models, as this information is useful both for understanding the source of and justification for parts of a model, and to facilitate automated processes such as merges. The availability of fully revision history aware repositories, and associated tools, will therefore be of significant benefit to the community.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, Michael T.; Safdari, Masoud; Kress, Jessica E.
The project described in this report constructed and exercised an innovative multiphysics coupling toolkit called the Illinois Rocstar MultiPhysics Application Coupling Toolkit (IMPACT). IMPACT is an open source, flexible, natively parallel infrastructure for coupling multiple uniphysics simulation codes into multiphysics computational systems. IMPACT works with codes written in several high-performance-computing (HPC) programming languages, and is designed from the beginning for HPC multiphysics code development. It is designed to be minimally invasive to the individual physics codes being integrated, and has few requirements on those physics codes for integration. The goal of IMPACT is to provide the support needed to enablemore » coupling existing tools together in unique and innovative ways to produce powerful new multiphysics technologies without extensive modification and rewrite of the physics packages being integrated. There are three major outcomes from this project: 1) construction, testing, application, and open-source release of the IMPACT infrastructure, 2) production of example open-source multiphysics tools using IMPACT, and 3) identification and engagement of interested organizations in the tools and applications resulting from the project. This last outcome represents the incipient development of a user community and application echosystem being built using IMPACT. Multiphysics coupling standardization can only come from organizations working together to define needs and processes that span the space of necessary multiphysics outcomes, which Illinois Rocstar plans to continue driving toward. The IMPACT system, including source code, documentation, and test problems are all now available through the public gitHUB.org system to anyone interested in multiphysics code coupling. Many of the basic documents explaining use and architecture of IMPACT are also attached as appendices to this document. Online HTML documentation is available through the gitHUB site. There are over 100 unit tests provided that run through the Illinois Rocstar Application Development (IRAD) lightweight testing infrastructure that is also supplied along with IMPACT. The package as a whole provides an excellent base for developing high-quality multiphysics applications using modern software development practices. To facilitate understanding how to utilize IMPACT effectively, two multiphysics systems have been developed and are available open-source through gitHUB. The simpler of the two systems, named ElmerFoamFSI in the repository, is a multiphysics, fluid-structure-interaction (FSI) coupling of the solid mechanics package Elmer with a fluid dynamics module from OpenFOAM. This coupling illustrates how to combine software packages that are unrelated by either author or architecture and combine them into a robust, parallel multiphysics system. A more complex multiphysics tool is the Illinois Rocstar Rocstar Multiphysics code that was rebuilt during the project around IMPACT. Rocstar Multiphysics was already an HPC multiphysics tool, but now that it has been rearchitected around IMPACT, it can be readily expanded to capture new and different physics in the future. In fact, during this project, the Elmer and OpenFOAM tools were also coupled into Rocstar Multiphysics and demonstrated. The full Rocstar Multiphysics codebase is also available on gitHUB, and licensed for any organization to use as they wish. Finally, the new IMPACT product is already being used in several multiphysics code coupling projects for the Air Force, NASA and the Missile Defense Agency, and initial work on expansion of the IMPACT-enabled Rocstar Multiphysics has begun in support of a commercial company. These initiatives promise to expand the interest and reach of IMPACT and Rocstar Multiphysics, ultimately leading to the envisioned standardization and consortium of users that was one of the goals of this project.« less
DendroPy: a Python library for phylogenetic computing.
Sukumaran, Jeet; Holder, Mark T
2010-06-15
DendroPy is a cross-platform library for the Python programming language that provides for object-oriented reading, writing, simulation and manipulation of phylogenetic data, with an emphasis on phylogenetic tree operations. DendroPy uses a splits-hash mapping to perform rapid calculations of tree distances, similarities and shape under various metrics. It contains rich simulation routines to generate trees under a number of different phylogenetic and coalescent models. DendroPy's data simulation and manipulation facilities, in conjunction with its support of a broad range of phylogenetic data formats (NEXUS, Newick, PHYLIP, FASTA, NeXML, etc.), allow it to serve a useful role in various phyloinformatics and phylogeographic pipelines. The stable release of the library is available for download and automated installation through the Python Package Index site (http://pypi.python.org/pypi/DendroPy), while the active development source code repository is available to the public from GitHub (http://github.com/jeetsukumaran/DendroPy).
The Particle-in-Cell and Kinetic Simulation Software Center
NASA Astrophysics Data System (ADS)
Mori, W. B.; Decyk, V. K.; Tableman, A.; Fonseca, R. A.; Tsung, F. S.; Hu, Q.; Winjum, B. J.; An, W.; Dalichaouch, T. N.; Davidson, A.; Hildebrand, L.; Joglekar, A.; May, J.; Miller, K.; Touati, M.; Xu, X. L.
2017-10-01
The UCLA Particle-in-Cell and Kinetic Simulation Software Center (PICKSC) aims to support an international community of PIC and plasma kinetic software developers, users, and educators; to increase the use of this software for accelerating the rate of scientific discovery; and to be a repository of knowledge and history for PIC. We discuss progress towards making available and documenting illustrative open-source software programs and distinct production programs; developing and comparing different PIC algorithms; coordinating the development of resources for the educational use of kinetic software; and the outcomes of our first sponsored OSIRIS users workshop. We also welcome input and discussion from anyone interested in using or developing kinetic software, in obtaining access to our codes, in collaborating, in sharing their own software, or in commenting on how PICKSC can better serve the DPP community. Supported by NSF under Grant ACI-1339893 and by the UCLA Institute for Digital Research and Education.
SPECIATE--EPA'S DATABASE OF SPECIATED EMISSION PROFILES
SPECIATE is EPA's repository of Total Organic Compound and Particulate Matter speciated profiles for a wide variety of sources. The profiles in this system are provided for air quality dispersion modeling and as a library for source-receptor and source apportionment type models. ...
SATORI: a system for ontology-guided visual exploration of biomedical data repositories.
Lekschas, Fritz; Gehlenborg, Nils
2018-04-01
The ever-increasing number of biomedical datasets provides tremendous opportunities for re-use but current data repositories provide limited means of exploration apart from text-based search. Ontological metadata annotations provide context by semantically relating datasets. Visualizing this rich network of relationships can improve the explorability of large data repositories and help researchers find datasets of interest. We developed SATORI-an integrative search and visual exploration interface for the exploration of biomedical data repositories. The design is informed by a requirements analysis through a series of semi-structured interviews. We evaluated the implementation of SATORI in a field study on a real-world data collection. SATORI enables researchers to seamlessly search, browse and semantically query data repositories via two visualizations that are highly interconnected with a powerful search interface. SATORI is an open-source web application, which is freely available at http://satori.refinery-platform.org and integrated into the Refinery Platform. nils@hms.harvard.edu. Supplementary data are available at Bioinformatics online.
Database Resources of the BIG Data Center in 2018
Xu, Xingjian; Hao, Lili; Zhu, Junwei; Tang, Bixia; Zhou, Qing; Song, Fuhai; Chen, Tingting; Zhang, Sisi; Dong, Lili; Lan, Li; Wang, Yanqing; Sang, Jian; Hao, Lili; Liang, Fang; Cao, Jiabao; Liu, Fang; Liu, Lin; Wang, Fan; Ma, Yingke; Xu, Xingjian; Zhang, Lijuan; Chen, Meili; Tian, Dongmei; Li, Cuiping; Dong, Lili; Du, Zhenglin; Yuan, Na; Zeng, Jingyao; Zhang, Zhewen; Wang, Jinyue; Shi, Shuo; Zhang, Yadong; Pan, Mengyu; Tang, Bixia; Zou, Dong; Song, Shuhui; Sang, Jian; Xia, Lin; Wang, Zhennan; Li, Man; Cao, Jiabao; Niu, Guangyi; Zhang, Yang; Sheng, Xin; Lu, Mingming; Wang, Qi; Xiao, Jingfa; Zou, Dong; Wang, Fan; Hao, Lili; Liang, Fang; Li, Mengwei; Sun, Shixiang; Zou, Dong; Li, Rujiao; Yu, Chunlei; Wang, Guangyu; Sang, Jian; Liu, Lin; Li, Mengwei; Li, Man; Niu, Guangyi; Cao, Jiabao; Sun, Shixiang; Xia, Lin; Yin, Hongyan; Zou, Dong; Xu, Xingjian; Ma, Lina; Chen, Huanxin; Sun, Yubin; Yu, Lei; Zhai, Shuang; Sun, Mingyuan; Zhang, Zhang; Zhao, Wenming; Xiao, Jingfa; Bao, Yiming; Song, Shuhui; Hao, Lili; Li, Rujiao; Ma, Lina; Sang, Jian; Wang, Yanqing; Tang, Bixia; Zou, Dong; Wang, Fan
2018-01-01
Abstract The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. PMID:29036542
Naessens, James M; Visscher, Sue L; Peterson, Stephanie M; Swanson, Kristi M; Johnson, Matthew G; Rahman, Parvez A; Schindler, Joe; Sonneborn, Mark; Fry, Donald E; Pine, Michael
2015-01-01
Objective Assess algorithms for linking patients across de-identified databases without compromising confidentiality. Data Sources/Study Setting Hospital discharges from 11 Mayo Clinic hospitals during January 2008–September 2012 (assessment and validation data). Minnesota death certificates and hospital discharges from 2009 to 2012 for entire state (application data). Study Design Cross-sectional assessment of sensitivity and positive predictive value (PPV) for four linking algorithms tested by identifying readmissions and posthospital mortality on the assessment data with application to statewide data. Data Collection/Extraction Methods De-identified claims included patient gender, birthdate, and zip code. Assessment records were matched with institutional sources containing unique identifiers and the last four digits of Social Security number (SSNL4). Principal Findings Gender, birthdate, and five-digit zip code identified readmissions with a sensitivity of 98.0 percent and a PPV of 97.7 percent and identified postdischarge mortality with 84.4 percent sensitivity and 98.9 percent PPV. Inclusion of SSNL4 produced nearly perfect identification of readmissions and deaths. When applied statewide, regions bordering states with unavailable hospital discharge data had lower rates. Conclusion Addition of SSNL4 to administrative data, accompanied by appropriate data use and data release policies, can enable trusted repositories to link data with nearly perfect accuracy without compromising patient confidentiality. States maintaining centralized de-identified databases should add SSNL4 to data specifications. PMID:26073819
DOE Office of Scientific and Technical Information (OSTI.GOV)
Claiborne, H.C.; Wagner, R.S.; Just, R.A.
1979-12-01
A direct comparison of transient thermal calculations was made with the heat transfer codes HEATING5, THAC-SIP-3D, ADINAT, SINDA, TRUMP, and TRANCO for a hypothetical nuclear waste repository. With the exception of TRUMP and SINDA (actually closer to the earlier CINDA3G version), the other codes agreed to within +-5% for the temperature rises as a function of time. The TRUMP results agreed within +-5% up to about 50 years, where the maximum temperature occurs, and then began an oscillary behavior with up to 25% deviations at longer times. This could have resulted from time steps that were too large or frommore » some unknown system problems. The available version of the SINDA code was not compatible with the IBM compiler without using an alternative method for handling a variable thermal conductivity. The results were about 40% low, but a reasonable agreement was obtained by assuming a uniform thermal conductivity; however, a programming error was later discovered in the alternative method. Some work is required on the IBM version to make it compatible with the system and still use the recommended method of handling variable thermal conductivity. TRANCO can only be run as a 2-D model, and TRUMP and CINDA apparently required longer running times and did not agree in the 2-D case; therefore, only HEATING5, THAC-SIP-3D, and ADINAT were used for the 3-D model calculations. The codes agreed within +-5%; at distances of about 1 ft from the waste canister edge, temperature rises were also close to that predicted by the 3-D model.« less
Thermo-hydrological and chemical (THC) modeling to support Field Test Design
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stauffer, Philip H.; Jordan, Amy B.; Harp, Dylan Robert
This report summarizes ongoing efforts to simulate coupled thermal-hydrological-chemical (THC) processes occurring within a hypothetical high-level waste (HLW) repository in bedded salt. The report includes work completed since the last project deliverable, “Coupled model for heat and water transport in a high level waste repository in salt”, a Level 2 milestone submitted to DOE in September 2013 (Stauffer et al., 2013). Since the last deliverable, there have been code updates to improve the integration of the salt module with the pre-existing code and development of quality assurance (QA) tests of constitutive functions and precipitation/dissolution reactions. Simulations of bench-scale experiments, bothmore » historical and currently in the planning stages have been performed. Additional simulations have also been performed on the drift-scale model that incorporate new processes, such as an evaporation function to estimate water vapor removal from the crushed salt backfill and isotopic fractionation of water isotopes. Finally, a draft of a journal paper on the importance of clay dehydration on water availability is included as Appendix I.« less
Benchmarking NNWSI flow and transport codes: COVE 1 results
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hayden, N.K.
1985-06-01
The code verification (COVE) activity of the Nevada Nuclear Waste Storage Investigations (NNWSI) Project is the first step in certification of flow and transport codes used for NNWSI performance assessments of a geologic repository for disposing of high-level radioactive wastes. The goals of the COVE activity are (1) to demonstrate and compare the numerical accuracy and sensitivity of certain codes, (2) to identify and resolve problems in running typical NNWSI performance assessment calculations, and (3) to evaluate computer requirements for running the codes. This report describes the work done for COVE 1, the first step in benchmarking some of themore » codes. Isothermal calculations for the COVE 1 benchmarking have been completed using the hydrologic flow codes SAGUARO, TRUST, and GWVIP; the radionuclide transport codes FEMTRAN and TRUMP; and the coupled flow and transport code TRACR3D. This report presents the results of three cases of the benchmarking problem solved for COVE 1, a comparison of the results, questions raised regarding sensitivities to modeling techniques, and conclusions drawn regarding the status and numerical sensitivities of the codes. 30 refs.« less
Rolling Deck to Repository (R2R): Building the Data Pipeline - Initial Results
NASA Astrophysics Data System (ADS)
Arko, R. A.; Clark, P. D.; Rioux, M. A.; McGovern, T. M.; Deering, T. W.; Hagg, R. K.; Payne, A. A.; Fischman, D. E.; Ferrini, V.
2009-12-01
The NSF-funded Rolling Deck to Repository (R2R) project is working with U.S. academic research vessel operators to ensure the documentation and preservation of data from routine “underway” (meteorological, geophysical, and oceanographic) sensor systems. A standard pipeline is being developed in which data are submitted by vessel operators directly to a central repository; inventoried in an integrated fleet-wide catalog; organized into discrete data sets with persistent unique identifiers; associated with essential cruise-level metadata; and delivered to the National Data Centers for archiving and dissemination. Several vessels including Atlantis, Healy, Hugh R. Sharp, Ka'imikai-O-Kanaloa, Kilo Moana, Knorr, Marcus G. Langseth, Melville, Oceanus, Roger Revelle, and Thomas G. Thompson began submitting data and documentation to R2R during the project’s pilot phase, and a repository infrastructure has been established. Cruise metadata, track maps, and data inventories are published at the R2R Web portal, with controlled vocabularies drawn from community standards (e.g. International Council for the Exploration of the Sea (ICES) ship codes). A direct connection has been established to the University-National Oceanographic Laboratory System (UNOLS) Ship Time Request and Scheduling System (STRS) via Web services to synchronize port codes and cruise schedules. A secure portal is being developed where operators may login to upload sailing orders, review data inventories, and create vessel profiles. R2R has established a standard procedure for submission of data to the National Geophysical Data Center (NGDC) that incorporates persistent unique identifiers for cruises, data sets, and individual files, using multibeam data as a test bed. Once proprietary holds are cleared and a data set is delivered to NGDC, the R2R catalog record is updated with the URL for direct download and it becomes immediately available to integration and synthesis projects such as the NSF-funded Global Multi-Resolution Topography (GMRT) synthesis. Similar procedures will be developed for delivery of data to other National Data Centers as appropriate.
Numerical modeling of perched water under Yucca Mountain, Nevada
Hinds, J.J.; Ge, S.; Fridrich, C.J.
1999-01-01
The presence of perched water near the potential high-level nuclear waste repository area at Yucca Mountain, Nevada, has important implications for waste isolation. Perched water occurs because of sharp contrasts in rock properties, in particular between the strongly fractured repository host rock (the Topopah Spring welded tuff) and the immediately underlying vitrophyric (glassy) subunit, in which fractures are sealed by clays that were formed by alteration of the volcanic glass. The vitrophyre acts as a vertical barrier to unsaturated flow throughout much of the potential repository area. Geochemical analyses (Yang et al. 1996) indicate that perched water is relatively young, perhaps younger than 10,000 years. Given the low permeability of the rock matrix, fractures and perhaps fault zones must play a crucial role in unsaturated flow. The geologic setting of the major perched water bodies under Yucca Mountain suggests that faults commonly form barriers to lateral flow at the level of the repository horizon, but may also form important pathways for vertical infiltration from the repository horizon down to the water table. Using the numerical code UNSAT2, two factors believed to influence the perched water system at Yucca Mountain, climate and fault-zone permeability, are explored. The two-dimensional model predicts that the volume of water held within the perched water system may greatly increase under wetter climatic conditions, and that perched water bodies may drain to the water table along fault zones. Modeling results also show fault flow to be significantly attenuated in the Paintbrush Tuff non-welded hydrogeologic unit.
Bleda, Marta; Tarraga, Joaquin; de Maria, Alejandro; Salavert, Francisco; Garcia-Alonso, Luz; Celma, Matilde; Martin, Ainoha; Dopazo, Joaquin; Medina, Ignacio
2012-07-01
During the past years, the advances in high-throughput technologies have produced an unprecedented growth in the number and size of repositories and databases storing relevant biological data. Today, there is more biological information than ever but, unfortunately, the current status of many of these repositories is far from being optimal. Some of the most common problems are that the information is spread out in many small databases; frequently there are different standards among repositories and some databases are no longer supported or they contain too specific and unconnected information. In addition, data size is increasingly becoming an obstacle when accessing or storing biological data. All these issues make very difficult to extract and integrate information from different sources, to analyze experiments or to access and query this information in a programmatic way. CellBase provides a solution to the growing necessity of integration by easing the access to biological data. CellBase implements a set of RESTful web services that query a centralized database containing the most relevant biological data sources. The database is hosted in our servers and is regularly updated. CellBase documentation can be found at http://docs.bioinfo.cipf.es/projects/cellbase.
Tsay, Ming-Yueh; Wu, Tai-Luan; Tseng, Ling-Li
2017-01-01
This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System). The 2001-2013 Nobel Laureates in Physics served as the sample. Bibliographic records of their publications were retrieved and downloaded from each system, and a computer program was developed to perform the analytical tasks of sorting, comparison, elimination, aggregation and statistical calculations. Quantitative analyses and cross-referencing were performed to determine the completeness and overlap of the system coverage of the six open access systems. The results may enable scholars to select an appropriate open access system as an efficient scholarly communication channel, and academic institutions may build institutional repositories or independently create citation index systems in the future. Suggestions on indicators and tools for academic assessment are presented based on the comprehensiveness assessment of each system.
Ramke, Jacqueline; Kuper, Hannah; Limburg, Hans; Kinloch, Jennifer; Zhu, Wenhui; Lansingh, Van C; Congdon, Nathan; Foster, Allen; Gilbert, Clare E
2018-02-01
Sources of avoidable waste in ophthalmic epidemiology include duplication of effort, and survey reports remaining unpublished, gaining publication after a long delay, or being incomplete or of poor quality. The aim of this review was to assess these sources of avoidable waste by examining blindness prevalence surveys undertaken in low and middle income countries (LMICs) between 2000 and 2014. On December 1, 2016 we searched MEDLINE, EMBASE and Web of Science databases for cross-sectional blindness prevalence surveys undertaken in LMICs between 2000 and 2014. All surveys listed on the Rapid Assessment of Avoidable Blindness (RAAB) Repository website ("the Repository") were also considered. For each survey we assessed (1) availability of scientific publication, survey report, summary results tables and/or datasets; (2) time to publication from year of survey completion and journal attributes; (3) extent of blindness information reported; and (4) rigour when information was available from two sources (i.e. whether it matched). Of the 279 included surveys (from 68 countries) 186 (67%) used RAAB methodology; 146 (52%) were published in a scientific journal, 57 (20%) were published in a journal and on the Repository, and 76 (27%) were on the Repository only (8% had tables; 19% had no information available beyond registration). Datasets were available for 50 RAABs (18% of included surveys). Time to publication ranged from <1 to 11 years (mean, standard deviation 2.8 ± 1.8 years). The extent of blindness information reported within studies varied (e.g. presenting and best-corrected, unilateral and bilateral); those with both a published report and Repository tables were most complete. For surveys published and with RAAB tables available, discrepancies were found in reporting of participant numbers (14% of studies) and blindness prevalence (15%). Strategies are needed to improve the availability, consistency, and quality of information reported from blindness prevalence surveys, and hence reduce avoidable waste.
A Climate Statistics Tool and Data Repository
NASA Astrophysics Data System (ADS)
Wang, J.; Kotamarthi, V. R.; Kuiper, J. A.; Orr, A.
2017-12-01
Researchers at Argonne National Laboratory and collaborating organizations have generated regional scale, dynamically downscaled climate model output using Weather Research and Forecasting (WRF) version 3.3.1 at a 12km horizontal spatial resolution over much of North America. The WRF model is driven by boundary conditions obtained from three independent global scale climate models and two different future greenhouse gas emission scenarios, named representative concentration pathways (RCPs). The repository of results has a temporal resolution of three hours for all the simulations, includes more than 50 variables, is stored in Network Common Data Form (NetCDF) files, and the data volume is nearly 600Tb. A condensed 800Gb set of NetCDF files were made for selected variables most useful for climate-related planning, including daily precipitation, relative humidity, solar radiation, maximum temperature, minimum temperature, and wind. The WRF model simulations are conducted for three 10-year time periods (1995-2004, 2045-2054, and 2085-2094), and two future scenarios RCP4.5 and RCP8.5). An open-source tool was coded using Python 2.7.8 and ESRI ArcGIS 10.3.1 programming libraries to parse the NetCDF files, compute summary statistics, and output results as GIS layers. Eight sets of summary statistics were generated as examples for the contiguous U.S. states and much of Alaska, including number of days over 90°F, number of days with a heat index over 90°F, heat waves, monthly and annual precipitation, drought, extreme precipitation, multi-model averages, and model bias. This paper will provide an overview of the project to generate the main and condensed data repositories, describe the Python tool and how to use it, present the GIS results of the computed examples, and discuss some of the ways they can be used for planning. The condensed climate data, Python tool, computed GIS results, and documentation of the work are shared on the Internet.
Gómez, Alberto; Nieto-Díaz, Manuel; Del Águila, Ángela; Arias, Enrique
2018-05-01
Transparency in science is increasingly a hot topic. Scientists are required to show not only results but also evidence of how they have achieved these results. In experimental studies of spinal cord injury, there are a number of standardized tests, such as the Basso-Beattie-Bresnahan locomotor rating scale for rats and Basso Mouse Scale for mice, which researchers use to study the pathophysiology of spinal cord injury and to evaluate the effects of experimental therapies. Although the standardized data from the Basso-Beattie-Bresnahan locomotor rating scale and the Basso Mouse Scale are particularly suited for storage and sharing in databases, systems of data acquisition and repositories are still lacking. To the best of our knowledge, both tests are usually conducted manually, with the data being recorded on a paper form, which may be documented with video recordings, before the data is transferred to a spreadsheet for analysis. The data thus obtained is used to compute global scores, which is the information that usually appears in publications, with a wealth of information being omitted. This information may be relevant to understand locomotion deficits or recovery, or even important aspects of the treatment effects. Therefore, this paper presents a mobile application to record and share Basso Mouse Scale tests, meeting the following criteria: i) user-friendly; ii) few hardware requirements (only a smartphone or tablet with a camera running under Android Operating System); and iii) based on open source software such as SQLite, XML, Java, Android Studio and Android SDK. The BAMOS app can be downloaded and installed from the Google Market repository and the app code is available at the GitHub repository. The BAMOS app demonstrates that mobile technology constitutes an opportunity to develop tools for aiding spinal cord injury scientists in recording and sharing experimental data. Copyright © 2018 Elsevier Ltd. All rights reserved.
The Fukushima Daiichi Accident Study Information Portal
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shawn St. Germain; Curtis Smith; David Schwieder
This paper presents a description of The Fukushima Daiichi Accident Study Information Portal. The Information Portal was created by the Idaho National Laboratory as part of joint NRC and DOE project to assess the severe accident modeling capability of the MELCOR analysis code. The Fukushima Daiichi Accident Study Information Portal was created to collect, store, retrieve and validate information and data for use in reconstructing the Fukushima Daiichi accident. In addition to supporting the MELCOR simulations, the Portal will be the main DOE repository for all data, studies and reports related to the accident at the Fukushima Daiichi nuclear powermore » station. The data is stored in a secured (password protected and encrypted) repository that is searchable and accessible to researchers at diverse locations.« less
Childhood Vesicoureteral Reflux Studies: Registries and Repositories Sources and Nosology
Chesney, Russell W.; Patters, Andrea B.
2012-01-01
Despite several recent studies, the advisability of antimicrobial prophylaxis and certain imaging studies for urinary tract infections (UTIs) remains controversial. The role of vesicoureteral reflux (VUR) on the severity and re-infection rates for UTIs is also difficult to assess. Registries and repositories of data and biomaterials from clinical studies in children with VUR are valuable. Disease registries are collections of secondary data related to patients with a specific diagnosis, condition or procedure. Registries differ from indices in that they contain more extensive data. A research repository is an entity that receives, stores, processes and/or disseminates specimens (or other materials) as needed. It encompasses the physical location as well as the full range of activities associated with its operation. It may also be referred to as a biorepository. This report provides information about some current registries and repositories that include data and samples from children with VUR. It also describes the heterogeneous nature of the subjects, as some registries and repositories include only data or samples from patients with primary reflux while others also include those from patients with syndromic or secondary reflux. PMID:23044377
Potential benefits of waste transmutation to the U.S. high-level waste respository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Michaels, G.E.
1995-10-01
This paper reexamines the potential benefits of waste transmutation to the proposed U.S. geologic repository at the Yucca Mountain site based on recent progress in the performance assessment for the Yucca Mountain base case of spent fuel emplacement. It is observed that actinides are assumed to have higher solubility than in previous studies and that Np and other actinides now dominate the projected aqueous releases from a Yucca Mountain repository. Actinides are also indentified as the dominant source of decay heat in the repository, and the effect of decay heat in perturbing the hydrology, geochemistry, and thermal characteristics of Yuccamore » Mountain are reviewed. It is concluded that the potential for thermally-driven, buoyant, gas-phase flow at Yucca Mountain introduces data and modeling requirements that will increase the costs of licensing the site and may cause the site to be unattractive for geologic disposal of wastes. A transmutation-enabled cold repository is proposed that might allow licensing of a repository to be based upon currently observable characteristics of the Yucca Mountain site.« less
Shared Medical Imaging Repositories.
Lebre, Rui; Bastião, Luís; Costa, Carlos
2018-01-01
This article describes the implementation of a solution for the integration of ownership concept and access control over medical imaging resources, making possible the centralization of multiple instances of repositories. The proposed architecture allows the association of permissions to repository resources and delegation of rights to third entities. It includes a programmatic interface for management of proposed services, made available through web services, with the ability to create, read, update and remove all components resulting from the architecture. The resulting work is a role-based access control mechanism that was integrated with Dicoogle Open-Source Project. The solution has several application scenarios like, for instance, collaborative platforms for research and tele-radiology services deployed at Cloud.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedrichs, D.R.; Argo, R.S.
The Assessment of Effectiveness of Geologic Isolation Systems (AEGIS) Program is developing and applying the methodology for assessing the far-field, long-term post-closure safety of deep geologic nuclear waste repositories. AEGIS is being performed by Pacific Northwest Laboratory (PNL) under contract with the Office of Nuclear Waste Isolation (ONWI) for the Department of Energy (DOE). One task within AEGIS is the development of methodology for analysis of the consequences (water pathway) from loss of repository containment as defined by various release scenarios. The various input parameters required in the analysis are compiled in data systems. The data are organized and preparedmore » by various input subroutines for utilization by the hydraulic and transport codes. The hydrologic models simulate the groundwater flow systems and provide water flow directions, rates, and velocities as inputs to the transport models. Outputs from the transport models are basically graphs of radionuclide concentration in the groundwater plotted against time. After dilution in the receiving surface-water body (e.g., lake, river, bay), these data are the input source terms for the dose models, if dose assessments are required. The dose models calculate radiation dose to individuals and populations. CIRMIS (Comprehensive Information Retrieval and Model Input Sequence) Data System, a storage and retrieval system for model input and output data, including graphical interpretation and display is described. This is the third of four volumes of the description of the CIRMIS Data System.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Friedrichs, D.R.
1980-01-01
The Assessment of Effectiveness of Geologic Isolation Systems (AEGIS) Program is developing and applying the methodology for assessing the far-field, long-term post-closure safety of deep geologic nuclear waste repositories. AEGIS is being performed by Pacific Northwest Laboratory (PNL) under contract with the Office of Nuclear Waste Isolation (ONWI) for the Department of Energy (DOE). One task within AEGIS is the development of methodology for analysis of the consequences (water pathway) from loss of repository containment as defined by various release scenarios. The various input parameters required in the analysis are compiled in data systems. The data are organized and preparedmore » by various input subroutines for use by the hydrologic and transport codes. The hydrologic models simulate the groundwater flow systems and provide water flow directions, rates, and velocities as inputs to the transport models. Outputs from the transport models are basically graphs of radionuclide concentration in the groundwater plotted against time. After dilution in the receiving surface-water body (e.g., lake, river, bay), these data are the input source terms for the dose models, if dose assessments are required. The dose models calculate radiation dose to individuals and populations. CIRMIS (Comprehensive Information Retrieval and Model Input Sequence) Data System, a storage and retrieval system for model input and output data, including graphical interpretation and display is described. This is the first of four volumes of the description of the CIRMIS Data System.« less
PeTMbase: A Database of Plant Endogenous Target Mimics (eTMs).
Karakülah, Gökhan; Yücebilgili Kurtoğlu, Kuaybe; Unver, Turgay
2016-01-01
MicroRNAs (miRNA) are small endogenous RNA molecules, which regulate target gene expression at post-transcriptional level. Besides, miRNA activity can be controlled by a newly discovered regulatory mechanism called endogenous target mimicry (eTM). In target mimicry, eTMs bind to the corresponding miRNAs to block the binding of specific transcript leading to increase mRNA expression. Thus, miRNA-eTM-target-mRNA regulation modules involving a wide range of biological processes; an increasing need for a comprehensive eTM database arose. Except miRSponge with limited number of Arabidopsis eTM data no available database and/or repository was developed and released for plant eTMs yet. Here, we present an online plant eTM database, called PeTMbase (http://petmbase.org), with a highly efficient search tool. To establish the repository a number of identified eTMs was obtained utilizing from high-throughput RNA-sequencing data of 11 plant species. Each transcriptome libraries is first mapped to corresponding plant genome, then long non-coding RNA (lncRNA) transcripts are characterized. Furthermore, additional lncRNAs retrieved from GREENC and PNRD were incorporated into the lncRNA catalog. Then, utilizing the lncRNA and miRNA sources a total of 2,728 eTMs were successfully predicted. Our regularly updated database, PeTMbase, provides high quality information regarding miRNA:eTM modules and will aid functional genomics studies particularly, on miRNA regulatory networks.
Database Resources of the BIG Data Center in 2018.
2018-01-04
The BIG Data Center at Beijing Institute of Genomics (BIG) of the Chinese Academy of Sciences provides freely open access to a suite of database resources in support of worldwide research activities in both academia and industry. With the vast amounts of omics data generated at ever-greater scales and rates, the BIG Data Center is continually expanding, updating and enriching its core database resources through big-data integration and value-added curation, including BioCode (a repository archiving bioinformatics tool codes), BioProject (a biological project library), BioSample (a biological sample library), Genome Sequence Archive (GSA, a data repository for archiving raw sequence reads), Genome Warehouse (GWH, a centralized resource housing genome-scale data), Genome Variation Map (GVM, a public repository of genome variations), Gene Expression Nebulas (GEN, a database of gene expression profiles based on RNA-Seq data), Methylation Bank (MethBank, an integrated databank of DNA methylomes), and Science Wikis (a series of biological knowledge wikis for community annotations). In addition, three featured web services are provided, viz., BIG Search (search as a service; a scalable inter-domain text search engine), BIG SSO (single sign-on as a service; a user access control system to gain access to multiple independent systems with a single ID and password) and Gsub (submission as a service; a unified submission service for all relevant resources). All of these resources are publicly accessible through the home page of the BIG Data Center at http://bigd.big.ac.cn. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
A geo-coded inventory of anophelines in the Afrotropical Region south of the Sahara: 1898-2016.
Kyalo, David; Amratia, Punam; Mundia, Clara W; Mbogo, Charles M; Coetzee, Maureen; Snow, Robert W
2017-01-01
Background : Understanding the distribution of anopheline vectors of malaria is an important prelude to the design of national malaria control and elimination programmes. A single, geo-coded continental inventory of anophelines using all available published and unpublished data has not been undertaken since the 1960s. Methods : We have searched African, European and World Health Organization archives to identify unpublished reports on anopheline surveys in 48 sub-Saharan Africa countries. This search was supplemented by identification of reports that formed part of post-graduate theses, conference abstracts, regional insecticide resistance databases and more traditional bibliographic searches of peer-reviewed literature. Finally, a check was made against two recent repositories of dominant malaria vector species locations ( circa 2,500). Each report was used to extract information on the survey dates, village locations (geo-coded to provide a longitude and latitude), sampling methods, species identification methods and all anopheline species found present during the survey. Survey records were collapsed to a single site over time. Results : The search strategy took years and resulted in 13,331 unique, geo-coded survey locations of anopheline vector occurrence between 1898 and 2016. A total of 12,204 (92%) sites reported the presence of 10 dominant vector species/sibling species; 4,473 (37%) of these sites were sampled since 2005. 4,442 (33%) sites reported at least one of 13 possible secondary vector species; 1,107 (25%) of these sites were sampled since 2005. Distributions of dominant and secondary vectors conform to previous descriptions of the ecological ranges of these vectors. Conclusion : We have assembled the largest ever geo-coded database of anophelines in Africa, representing a legacy dataset for future updating and identification of knowledge gaps at national levels. The geo-coded database is available on Harvard Dataverse as a reference source for African national malaria control programmes planning their future control and elimination strategies.
A geo-coded inventory of anophelines in the Afrotropical Region south of the Sahara: 1898-2016
Kyalo, David; Amratia, Punam; Mundia, Clara W.; Mbogo, Charles M.; Coetzee, Maureen; Snow, Robert W.
2017-01-01
Background: Understanding the distribution of anopheline vectors of malaria is an important prelude to the design of national malaria control and elimination programmes. A single, geo-coded continental inventory of anophelines using all available published and unpublished data has not been undertaken since the 1960s. Methods: We have searched African, European and World Health Organization archives to identify unpublished reports on anopheline surveys in 48 sub-Saharan Africa countries. This search was supplemented by identification of reports that formed part of post-graduate theses, conference abstracts, regional insecticide resistance databases and more traditional bibliographic searches of peer-reviewed literature. Finally, a check was made against two recent repositories of dominant malaria vector species locations ( circa 2,500). Each report was used to extract information on the survey dates, village locations (geo-coded to provide a longitude and latitude), sampling methods, species identification methods and all anopheline species found present during the survey. Survey records were collapsed to a single site over time. Results: The search strategy took years and resulted in 13,331 unique, geo-coded survey locations of anopheline vector occurrence between 1898 and 2016. A total of 12,204 (92%) sites reported the presence of 10 dominant vector species/sibling species; 4,473 (37%) of these sites were sampled since 2005. 4,442 (33%) sites reported at least one of 13 possible secondary vector species; 1,107 (25%) of these sites were sampled since 2005. Distributions of dominant and secondary vectors conform to previous descriptions of the ecological ranges of these vectors. Conclusion: We have assembled the largest ever geo-coded database of anophelines in Africa, representing a legacy dataset for future updating and identification of knowledge gaps at national levels. The geo-coded database is available on Harvard Dataverse as a reference source for African national malaria control programmes planning their future control and elimination strategies. PMID:28884158
A Shared Infrastructure for Federated Search Across Distributed Scientific Metadata Catalogs
NASA Astrophysics Data System (ADS)
Reed, S. A.; Truslove, I.; Billingsley, B. W.; Grauch, A.; Harper, D.; Kovarik, J.; Lopez, L.; Liu, M.; Brandt, M.
2013-12-01
The vast amount of science metadata can be overwhelming and highly complex. Comprehensive analysis and sharing of metadata is difficult since institutions often publish to their own repositories. There are many disjoint standards used for publishing scientific data, making it difficult to discover and share information from different sources. Services that publish metadata catalogs often have different protocols, formats, and semantics. The research community is limited by the exclusivity of separate metadata catalogs and thus it is desirable to have federated search interfaces capable of unified search queries across multiple sources. Aggregation of metadata catalogs also enables users to critique metadata more rigorously. With these motivations in mind, the National Snow and Ice Data Center (NSIDC) and Advanced Cooperative Arctic Data and Information Service (ACADIS) implemented two search interfaces for the community. Both the NSIDC Search and ACADIS Arctic Data Explorer (ADE) use a common infrastructure which keeps maintenance costs low. The search clients are designed to make OpenSearch requests against Solr, an Open Source search platform. Solr applies indexes to specific fields of the metadata which in this instance optimizes queries containing keywords, spatial bounds and temporal ranges. NSIDC metadata is reused by both search interfaces but the ADE also brokers additional sources. Users can quickly find relevant metadata with minimal effort and ultimately lowers costs for research. This presentation will highlight the reuse of data and code between NSIDC and ACADIS, discuss challenges and milestones for each project, and will identify creation and use of Open Source libraries.
Siderits, Richard; Yates, Stacy; Rodriguez, Arelis; Lee, Tina; Rimmer, Cheryl; Roche, Mark
2011-01-01
Quick Response (QR) Codes are standard in supply management and seen with increasing frequency in advertisements. They are now present regularly in healthcare informatics and education. These 2-dimensional square bar codes, originally designed by the Toyota car company, are free of license and have a published international standard. The codes can be generated by free online software and the resulting images incorporated into presentations. The images can be scanned by "smart" phones and tablets using either the iOS or Android platforms, which link the device with the information represented by the QR code (uniform resource locator or URL, online video, text, v-calendar entries, short message service [SMS] and formatted text). Once linked to the device, the information can be viewed at any time after the original presentation, saved in the device or to a Web-based "cloud" repository, printed, or shared with others via email or Bluetooth file transfer. This paper describes how we use QR codes in our tumor board presentations, discusses the benefits, the different QR codes from Web links and how QR codes facilitate the distribution of educational content.
Designing and maintaining an effective chargemaster.
Abbey, D C
2001-03-01
The chargemaster is the central repository of charges and associated coding information used to develop claims. But this simple description belies the chargemaster's true complexity. The chargemaster's role in the coding process differs from department to department, and not all codes provided on a claim form are necessarily included in the chargemaster, as codes for complex services may need to be developed and reviewed by coding staff. In addition, with the rise of managed care, the chargemaster increasingly is being used to track utilization of supplies and services. To ensure that the chargemaster performs all of its functions effectively, hospitals should appoint a chargemaster coordinator, supported by a chargemaster review team, to oversee the design and maintenance of the chargemaster. Important design issues that should be considered include the principle of "form follows function," static versus dynamic coding, how modifiers should be treated, how charges should be developed, how to incorporate physician fee schedules into the chargemaster, the interface between the chargemaster and cost reports, and how to include statistical information for tracking utilization.
SPECIATE Version 4.4 Database Development Documentation
SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Some of the many uses of these source profiles include: (1) creating speciated emissions inventories for regi...
SPECIATE 4.2: speciation Database Development Documentation
SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...
JEnsembl: a version-aware Java API to Ensembl data systems
Paterson, Trevor; Law, Andy
2012-01-01
Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge (http://jensembl.sourceforge.net). Contact: jensembl-develop@lists.sf.net, andy.law@roslin.ed.ac.uk, trevor.paterson@roslin.ed.ac.uk PMID:22945789
A collection of open source applications for mass spectrometry data mining.
Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin
2014-10-01
We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
SMART-on-FHIR implemented over i2b2
Mandel, Joshua C; Klann, Jeffery G; Wattanasin, Nich; Mendis, Michael; Chute, Christopher G; Mandl, Kenneth D; Murphy, Shawn N
2017-01-01
We have developed an interface to serve patient data from Informatics for Integrating Biology and the Bedside (i2b2) repositories in the Fast Healthcare Interoperability Resources (FHIR) format, referred to as a SMART-on-FHIR cell. The cell serves FHIR resources on a per-patient basis, and supports the “substitutable” modular third-party applications (SMART) OAuth2 specification for authorization of client applications. It is implemented as an i2b2 server plug-in, consisting of 6 modules: authentication, REST, i2b2-to-FHIR converter, resource enrichment, query engine, and cache. The source code is freely available as open source. We tested the cell by accessing resources from a test i2b2 installation, demonstrating that a SMART app can be launched from the cell that accesses patient data stored in i2b2. We successfully retrieved demographics, medications, labs, and diagnoses for test patients. The SMART-on-FHIR cell will enable i2b2 sites to provide simplified but secure data access in FHIR format, and will spur innovation and interoperability. Further, it transforms i2b2 into an apps platform. PMID:27274012
Ensemble Eclipse: A Process for Prefab Development Environment for the Ensemble Project
NASA Technical Reports Server (NTRS)
Wallick, Michael N.; Mittman, David S.; Shams, Khawaja, S.; Bachmann, Andrew G.; Ludowise, Melissa
2013-01-01
This software simplifies the process of having to set up an Eclipse IDE programming environment for the members of the cross-NASA center project, Ensemble. It achieves this by assembling all the necessary add-ons and custom tools/preferences. This software is unique in that it allows developers in the Ensemble Project (approximately 20 to 40 at any time) across multiple NASA centers to set up a development environment almost instantly and work on Ensemble software. The software automatically has the source code repositories and other vital information and settings included. The Eclipse IDE is an open-source development framework. The NASA (Ensemble-specific) version of the software includes Ensemble-specific plug-ins as well as settings for the Ensemble project. This software saves developers the time and hassle of setting up a programming environment, making sure that everything is set up in the correct manner for Ensemble development. Existing software (i.e., standard Eclipse) requires an intensive setup process that is both time-consuming and error prone. This software is built once by a single user and tested, allowing other developers to simply download and use the software
Reuse: A knowledge-based approach
NASA Technical Reports Server (NTRS)
Iscoe, Neil; Liu, Zheng-Yang; Feng, Guohui
1992-01-01
This paper describes our research in automating the reuse process through the use of application domain models. Application domain models are explicit formal representations of the application knowledge necessary to understand, specify, and generate application programs. Furthermore, they provide a unified repository for the operational structure, rules, policies, and constraints of a specific application area. In our approach, domain models are expressed in terms of a transaction-based meta-modeling language. This paper has described in detail the creation and maintenance of hierarchical structures. These structures are created through a process that includes reverse engineering of data models with supplementary enhancement from application experts. Source code is also reverse engineered but is not a major source of domain model instantiation at this time. In the second phase of the software synthesis process, program specifications are interactively synthesized from an instantiated domain model. These specifications are currently integrated into a manual programming process but will eventually be used to derive executable code with mechanically assisted transformations. This research is performed within the context of programming-in-the-large types of systems. Although our goals are ambitious, we are implementing the synthesis system in an incremental manner through which we can realize tangible results. The client/server architecture is capable of supporting 16 simultaneous X/Motif users and tens of thousands of attributes and classes. Domain models have been partially synthesized from five different application areas. As additional domain models are synthesized and additional knowledge is gathered, we will inevitably add to and modify our representation. However, our current experience indicates that it will scale and expand to meet our modeling needs.
SPECIATE 4.4: The Bridge Between Emissions Characterization and Modeling
SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Some of the many uses of these source profiles include: (1) creating speciated emissions inventories for...
The Development and Uses of EPA's SPECIATE Database
SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic compounds (VOC) and particulate matter (PM) speciation profiles of air pollution sources. These source profiles can be used to (l) provide input to chemical mass balance (CMB) receptor mod...
The Classification and Evaluation of Computer-Aided Software Engineering Tools
1990-09-01
International Business Machines Corporation Customizer is a Registered Trademark of Index Technology Corporation Data Analyst is a Registered Trademark of...years, a rapid series of new approaches have been adopted including: information engineering, entity- relationship modeling, automatic code generation...support true information sharing among tools and automated consistency checking. Moreover, the repository must record and manage the relationships and
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schaller, A.; Skanata, D.
1995-12-31
Site selection approach to radioactive waste disposal facility, which is under way in Croatia, is presented in the paper. This approach is based on application of certain relevant terrestrial and technical criteria in the site selection process. Basic documentation used for this purpose are regional planning documents prepared by the Regional Planning Institute of Croatia. The basic result of research described in the paper is the proposal of several potential areas which are suitable for siting a radioactive waste repository. All relevant conclusions are based on both data groups -- generic and on-field experienced (measured). Out of a dozen potentialmore » areas, four have been chosen as representative by the authors. The presented comparative analysis was made by means of the VISA II computer code, developed by the V. Belton and SPV Software Products. The code was donated to the APO by the IAEA. The main objective of the paper is to initiate and facilitate further discussions on possible ways of evaluation and comparison of potential areas for sitting of radioactive waste repository in this country, as well as to provide additional contributions to the current site selection process in the Republic of Croatia.« less
Breytenbach, Amelia; Lourens, Antoinette; Marsh, Susan
2013-04-26
The history of veterinary science in South Africa can only be appreciated, studied, researched and passed on to coming generations if historical sources are readily available. In most countries, material and sources with historical value are often difficult to locate, dispersed over a large area and not part of the conventional book and journal literature. The Faculty of Veterinary Science of the University of Pretoria and its library has access to a large collection of historical sources. The collection consists of photographs, photographic slides, documents, proceedings, posters, audio-visual material, postcards and other memorabilia. Other institutions in the country are also approached if relevant sources are identified in their collections. The University of Pretoria's institutional repository, UPSpace, was launched in 2006. This provided the Jotello F. Soga Library with the opportunity to fill the repository with relevant digitised collections of diverse heritage and learning resources that can contribute to the long-term preservation and accessibility of historical veterinary sources. These collections are available for use not only by historians and researchers in South Africa but also elsewhere in Africa and the rest of the world. Important historical collections such as the Arnold Theiler collection, the Jotello F. Soga collection and collections of the Onderstepoort Journal of Veterinary Research and the Journal of the South African Veterinary Association are highlighted. The benefits of an open access digital repository, the importance of collaboration across the veterinary community and other prerequisites for the sustainability of a digitisation project and the importance of metadata to enhance accessibility are covered.
NASA Astrophysics Data System (ADS)
Gao, M.; Huang, S. T.; Wang, P.; Zhao, Y. A.; Wang, H. B.
2016-11-01
The geological disposal of high-level radioactive waste (hereinafter referred to "geological disposal") is a long-term, complex, and systematic scientific project, whose data and information resources in the research and development ((hereinafter referred to ”R&D”) process provide the significant support for R&D of geological disposal system, and lay a foundation for the long-term stability and safety assessment of repository site. However, the data related to the research and engineering in the sitting of the geological disposal repositories is more complicated (including multi-source, multi-dimension and changeable), the requirements for the data accuracy and comprehensive application has become much higher than before, which lead to the fact that the data model design of geo-information database for the disposal repository are facing more serious challenges. In the essay, data resources of the pre-selected areas of the repository has been comprehensive controlled and systematic analyzed. According to deeply understanding of the application requirements, the research work has made a solution for the key technical problems including reasonable classification system of multi-source data entity, complex logic relations and effective physical storage structures. The new solution has broken through data classification and conventional spatial data the organization model applied in the traditional industry, realized the data organization and integration with the unit of data entities and spatial relationship, which were independent, holonomic and with application significant features in HLW geological disposal. The reasonable, feasible and flexible data conceptual models, logical models and physical models have been established so as to ensure the effective integration and facilitate application development of multi-source data in pre-selected areas for geological disposal.
Wu, Tai-luan; Tseng, Ling-li
2017-01-01
This study examines the completeness and overlap of coverage in physics of six open access scholarly communication systems, including two search engines (Google Scholar and Microsoft Academic), two aggregate institutional repositories (OAIster and OpenDOAR), and two physics-related open sources (arXiv.org and Astrophysics Data System). The 2001–2013 Nobel Laureates in Physics served as the sample. Bibliographic records of their publications were retrieved and downloaded from each system, and a computer program was developed to perform the analytical tasks of sorting, comparison, elimination, aggregation and statistical calculations. Quantitative analyses and cross-referencing were performed to determine the completeness and overlap of the system coverage of the six open access systems. The results may enable scholars to select an appropriate open access system as an efficient scholarly communication channel, and academic institutions may build institutional repositories or independently create citation index systems in the future. Suggestions on indicators and tools for academic assessment are presented based on the comprehensiveness assessment of each system. PMID:29267327
Thermal Analysis of a Nuclear Waste Repository in Argillite Host Rock
NASA Astrophysics Data System (ADS)
Hadgu, T.; Gomez, S. P.; Matteo, E. N.
2017-12-01
Disposal of high-level nuclear waste in a geological repository requires analysis of heat distribution as a result of decay heat. Such an analysis supports design of repository layout to define repository footprint as well as provide information of importance to overall design. The analysis is also used in the study of potential migration of radionuclides to the accessible environment. In this study, thermal analysis for high-level waste and spent nuclear fuel in a generic repository in argillite host rock is presented. The thermal analysis utilized both semi-analytical and numerical modeling in the near field of a repository. The semi-analytical method looks at heat transport by conduction in the repository and surroundings. The results of the simulation method are temperature histories at selected radial distances from the waste package. A 3-D thermal-hydrologic numerical model was also conducted to study fluid and heat distribution in the near field. The thermal analysis assumed a generic geological repository at 500 m depth. For the semi-analytical method, a backfilled closed repository was assumed with basic design and material properties. For the thermal-hydrologic numerical method, a repository layout with disposal in horizontal boreholes was assumed. The 3-D modeling domain covers a limited portion of the repository footprint to enable a detailed thermal analysis. A highly refined unstructured mesh was used with increased discretization near heat sources and at intersections of different materials. All simulations considered different parameter values for properties of components of the engineered barrier system (i.e. buffer, disturbed rock zone and the host rock), and different surface storage times. Results of the different modeling cases are presented and include temperature and fluid flow profiles in the near field at different simulation times. Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-NA-0003525. SAND2017-8295 A.
NASA Astrophysics Data System (ADS)
Zeitler, T.; Kirchner, T. B.; Hammond, G. E.; Park, H.
2014-12-01
The Waste Isolation Pilot Plant (WIPP) has been developed by the U.S. Department of Energy (DOE) for the geologic (deep underground) disposal of transuranic (TRU) waste. Containment of TRU waste at the WIPP is regulated by the U.S. Environmental Protection Agency (EPA). The DOE demonstrates compliance with the containment requirements by means of performance assessment (PA) calculations. WIPP PA calculations estimate the probability and consequence of potential radionuclide releases from the repository to the accessible environment for a regulatory period of 10,000 years after facility closure. The long-term performance of the repository is assessed using a suite of sophisticated computational codes. In a broad modernization effort, the DOE has overseen the transfer of these codes to modern hardware and software platforms. Additionally, there is a current effort to establish new performance assessment capabilities through the further development of the PFLOTRAN software, a state-of-the-art massively parallel subsurface flow and reactive transport code. Improvements to the current computational environment will result in greater detail in the final models due to the parallelization afforded by the modern code. Parallelization will allow for relatively faster calculations, as well as a move from a two-dimensional calculation grid to a three-dimensional grid. The result of the modernization effort will be a state-of-the-art subsurface flow and transport capability that will serve WIPP PA into the future. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000. This research is funded by WIPP programs administered by the Office of Environmental Management (EM) of the U.S Department of Energy.
Using FLOSS Project Metadata in the Undergraduate Classroom
NASA Astrophysics Data System (ADS)
Squire, Megan; Duvall, Shannon
This paper describes our efforts to use the large amounts of data available from public repositories of free, libre, and open source software (FLOSS) in our undergraduate classrooms to teach concepts that would have previously been taught using other types of data from other sources.
Pollock, David W.
1986-01-01
Many parts of the Great Basin have thick zones of unsaturated alluvium which might be suitable for disposing of high-level radioactive wastes. A mathematical model accounting for the coupled transport of energy, water (vapor and liquid), and dry air was used to analyze one-dimensional, vertical transport above and below an areally extensive repository. Numerical simulations were conducted for a hypothetical repository containing spent nuclear fuel and located 100 m below land surface. Initial steady state downward water fluxes of zero (hydrostatic) and 0.0003 m yr−1were considered in an attempt to bracket the likely range in natural water flux. Predicted temperatures within the repository peaked after approximately 50 years and declined slowly thereafter in response to the decreasing intensity of the radioactive heat source. The alluvium near the repository experienced a cycle of drying and rewetting in both cases. The extent of the dry zone was strongly controlled by the mobility of liquid water near the repository under natural conditions. In the case of initial hydrostatic conditions, the dry zone extended approximately 10 m above and 15 m below the repository. For the case of a natural flux of 0.0003 m yr−1 the relative permeability of water near the repository was initially more than 30 times the value under hydrostatic conditions, consequently the dry zone extended only about 2 m above and 5 m below the repository. In both cases a significant perturbation in liquid saturation levels persisted for several hundred years. This analysis illustrates the extreme sensitivity of model predictions to initial conditions and parameters, such as relative permeability and moisture characteristic curves, that are often poorly known.
NASA Technical Reports Server (NTRS)
Ancheta, T. C., Jr.
1976-01-01
A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A 'universal' generalization of syndrome-source-coding is formulated which provides robustly effective distortionless coding of source ensembles. Two examples are given, comparing the performance of noiseless universal syndrome-source-coding to (1) run-length coding and (2) Lynch-Davisson-Schalkwijk-Cover universal coding for an ensemble of binary memoryless sources.
Rekadwad, Bhagwan N; Khobragade, Chandrahasya N
2016-06-01
Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58' N; 76° 31' E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species.
Childhood vesicoureteral reflux studies: registries and repositories sources and nosology.
Chesney, Russell W; Patters, Andrea B
2013-12-01
Despite several recent studies, the advisability of antimicrobial prophylaxis and certain imaging studies for urinary tract infections (UTIs) remains controversial. The role of vesicoureteral reflux (VUR) on the severity and re-infection rates for UTIs is also difficult to assess. Registries and repositories of data and biomaterials from clinical studies in children with VUR are valuable. Disease registries are collections of secondary data related to patients with a specific diagnosis, condition or procedure. Registries differ from indices in that they contain more extensive data. A research repository is an entity that receives, stores, processes and/or disseminates specimens (or other materials) as needed. It encompasses the physical location as well as the full range of activities associated with its operation. It may also be referred to as a biorepository. This report provides information about some current registries and repositories that include data and samples from children with VUR. It also describes the heterogeneous nature of the subjects, as some registries and repositories include only data or samples from patients with primary reflux while others also include those from patients with syndromic or secondary reflux. Copyright © 2012 Journal of Pediatric Urology Company. All rights reserved.
ERIC Educational Resources Information Center
O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.
2003-01-01
Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
The new on-line Czech Food Composition Database.
Machackova, Marie; Holasova, Marie; Maskova, Eva
2013-10-01
The new on-line Czech Food Composition Database (FCDB) was launched on http://www.czfcdb.cz in December 2010 as a main freely available channel for dissemination of Czech food composition data. The application is based on a complied FCDB documented according to the EuroFIR standardised procedure for full value documentation and indexing of foods by the LanguaL™ Thesaurus. A content management system was implemented for administration of the website and performing data export (comma-separated values or EuroFIR XML transport package formats) by a compiler. Reference/s are provided for each published value with linking to available freely accessible on-line sources of data (e.g. full texts, EuroFIR Document Repository, on-line national FCDBs). LanguaL™ codes are displayed within each food record as searchable keywords of the database. A photo (or a photo gallery) is used as a visual descriptor of a food item. The application is searchable on foods, components, food groups, alphabet and a multi-field advanced search. Copyright © 2013 Elsevier Ltd. All rights reserved.
Weichelt, Bryan; Salzwedel, Marsha; Heiberger, Scott; Lee, Barbara C
2018-05-22
The AgInjuryNews system and dataset are a news report repository and information source for agricultural safety professionals, policymakers, journalists, and law enforcement officials. AgInjuryNews was designed as a primary storage and retrieval system that allows users to: identify agricultural injury/fatality events; identify injury agents and emerging issues; provide safety messages for media in anticipation of trends; and raise awareness and knowledge of agricultural injuries and prevention strategies. Data are primarily collected through Google Alerts and a digital media subscription service. Articles are screened, reviewed, coded, and entered into the system. As of January 1, 2018, the system contained 3028 unique incidents. Of those, 650 involved youth, and 1807 were fatalities. The system also had registered 329 users from 39 countries. AgInjuryNews combines injury reports into one dataset and may be the most current and comprehensive publicly available collection of news reports on agricultural injuries and deaths. © 2018 Wiley Periodicals, Inc.
Crowley, Rebecca S; Castine, Melissa; Mitchell, Kevin; Chavan, Girish; McSherry, Tara; Feldman, Michael
2010-01-01
The authors report on the development of the Cancer Tissue Information Extraction System (caTIES)--an application that supports collaborative tissue banking and text mining by leveraging existing natural language processing methods and algorithms, grid communication and security frameworks, and query visualization methods. The system fills an important need for text-derived clinical data in translational research such as tissue-banking and clinical trials. The design of caTIES addresses three critical issues for informatics support of translational research: (1) federation of research data sources derived from clinical systems; (2) expressive graphical interfaces for concept-based text mining; and (3) regulatory and security model for supporting multi-center collaborative research. Implementation of the system at several Cancer Centers across the country is creating a potential network of caTIES repositories that could provide millions of de-identified clinical reports to users. The system provides an end-to-end application of medical natural language processing to support multi-institutional translational research programs.
A new code for modelling the near field diffusion releases from the final disposal of nuclear waste
NASA Astrophysics Data System (ADS)
Vopálka, D.; Vokál, A.
2003-01-01
The canisters with spent nuclear fuel produced during the operation of WWER reactors at the Czech power plants are planned, like in other countries, to be disposed of in an underground repository. Canisters will be surrounded by compacted bentonite that will retard the migration of safety-relevant radionuclides into the host rock. A new code that enables the modelling of the critical radionuclides transport from the canister through the bentonite layer in the cylindrical geometry was developed. The code enables to solve the diffusion equation for various types of initial and boundary conditions by means of the finite difference method and to take into account the non-linear shape of the sorption isotherm. A comparison of the code reported here with code PAGODA, which is based on analytical solution of the transport equation, was made for the actinide chain 4N+3 that includes 239Pu. A simple parametric study of the releases of 239Pu, 129I, and 14C into geosphere is discussed.
SPECIATE 4.3: Addendum to SPECIATE 4.2--Speciation database development documentation
SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. Among the many uses of speciation data, these source profiles are used to: (1) create speciated emissions inve...
ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)
ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...
Rekadwad, Bhagwan N; Khobragade, Chandrahasya N
2016-03-01
16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85'N and 78°25'E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/.
The Privacy and Security Implications of Open Data in Healthcare.
Kobayashi, Shinji; Kane, Thomas B; Paton, Chris
2018-04-22
The International Medical Informatics Association (IMIA) Open Source Working Group (OSWG) initiated a group discussion to discuss current privacy and security issues in the open data movement in the healthcare domain from the perspective of the OSWG membership. Working group members independently reviewed the recent academic and grey literature and sampled a number of current large-scale open data projects to inform the working group discussion. This paper presents an overview of open data repositories and a series of short case reports to highlight relevant issues present in the recent literature concerning the adoption of open approaches to sharing healthcare datasets. Important themes that emerged included data standardisation, the inter-connected nature of the open source and open data movements, and how publishing open data can impact on the ethics, security, and privacy of informatics projects. The open data and open source movements in healthcare share many common philosophies and approaches including developing international collaborations across multiple organisations and domains of expertise. Both movements aim to reduce the costs of advancing scientific research and improving healthcare provision for people around the world by adopting open intellectual property licence agreements and codes of practice. Implications of the increased adoption of open data in healthcare include the need to balance the security and privacy challenges of opening data sources with the potential benefits of open data for improving research and healthcare delivery. Georg Thieme Verlag KG Stuttgart.
Cryptanalysis of the Sodark Family of Cipher Algorithms
2017-09-01
software project for building three-bit LUT circuit representations of S- boxes is available as a GitHub repository [40]. It contains several improvements...DISTRIBUTION / AVAILABILITY STATEMENT Approved for public release. Distribution is unlimited. 12b. DISTRIBUTION CODE 13. ABSTRACT (maximum 200 words) The...second- and third-generation automatic link establishment (ALE) systems for high frequency radios. Radios utilizing ALE technology are in use by a
Extending software repository hosting to code review and testing
NASA Astrophysics Data System (ADS)
Gonzalez Alvarez, A.; Aparicio Cotarelo, B.; Lossent, A.; Andersen, T.; Trzcinska, A.; Asbury, D.; Hłimyr, N.; Meinhard, H.
2015-12-01
We will describe how CERN's services around Issue Tracking and Version Control have evolved, and what the plans for the future are. We will describe the services main design, integration and structure, giving special attention to the new requirements from the community of users in terms of collaboration and integration tools and how we address this challenge when defining new services based on GitLab for collaboration to replace our current Gitolite service and Code Review and Jenkins for Continuous Integration. These new services complement the existing ones to create a new global "development tool stack" where each working group can place its particular development work-flow.
Momota, Ryusuke; Ohtsuka, Aiji
2018-01-01
Anatomy is the science and art of understanding the structure of the body and its components in relation to the functions of the whole-body system. Medicine is based on a deep understanding of anatomy, but quite a few introductory-level learners are overwhelmed by the sheer amount of anatomical terminology that must be understood, so they regard anatomy as a dull and dense subject. To help them learn anatomical terms in a more contextual way, we started a new open-source project, the Network of Anatomical Texts (NAnaTex), which visualizes relationships of body components by integrating text-based anatomical information using Cytoscape, a network visualization software platform. Here, we present a network of bones and muscles produced from literature descriptions. As this network is primarily text-based and does not require any programming knowledge, it is easy to implement new functions or provide extra information by making changes to the original text files. To facilitate collaborations, we deposited the source code files for the network into the GitHub repository ( https://github.com/ryusukemomota/nanatex ) so that anybody can participate in the evolution of the network and use it for their own non-profit purposes. This project should help not only introductory-level learners but also professional medical practitioners, who could use it as a quick reference.
EPA’s SPECIATE 4.4 Database: Bridging Data Sources and Data Users
SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...
PathVisio 3: an extendable pathway analysis toolbox.
Kutmon, Martina; van Iersel, Martijn P; Bohler, Anwesha; Kelder, Thomas; Nunes, Nuno; Pico, Alexander R; Evelo, Chris T
2015-02-01
PathVisio is a commonly used pathway editor, visualization and analysis software. Biological pathways have been used by biologists for many years to describe the detailed steps in biological processes. Those powerful, visual representations help researchers to better understand, share and discuss knowledge. Since the first publication of PathVisio in 2008, the original paper was cited more than 170 times and PathVisio was used in many different biological studies. As an online editor PathVisio is also integrated in the community curated pathway database WikiPathways. Here we present the third version of PathVisio with the newest additions and improvements of the application. The core features of PathVisio are pathway drawing, advanced data visualization and pathway statistics. Additionally, PathVisio 3 introduces a new powerful extension systems that allows other developers to contribute additional functionality in form of plugins without changing the core application. PathVisio can be downloaded from http://www.pathvisio.org and in 2014 PathVisio 3 has been downloaded over 5,500 times. There are already more than 15 plugins available in the central plugin repository. PathVisio is a freely available, open-source tool published under the Apache 2.0 license (http://www.apache.org/licenses/LICENSE-2.0). It is implemented in Java and thus runs on all major operating systems. The code repository is available at http://svn.bigcat.unimaas.nl/pathvisio. The support mailing list for users is available on https://groups.google.com/forum/#!forum/wikipathways-discuss and for developers on https://groups.google.com/forum/#!forum/wikipathways-devel.
Rodrigues, J M; Trombert-Paviot, B; Baud, R; Wagner, J; Meusnier-Carriot, F
1998-01-01
GALEN has developed a language independent common reference model based on a medically oriented ontology and practical tools and techniques for managing healthcare terminology including natural language processing. GALEN-IN-USE is the current phase which applied the modelling and the tools to the development or the updating of coding systems for surgical procedures in different national coding centers co-operating within the European Federation of Coding Centre (EFCC) to create a language independent knowledge repository for multicultural Europe. We used an integrated set of artificial intelligence terminology tools named CLAssification Manager workbench to process French professional medical language rubrics into intermediate dissections and to the Grail reference ontology model representation. From this language independent concept model representation we generate controlled French natural language. The French national coding centre is then able to retrieve the initial professional rubrics with different categories of concepts, to compare the professional language proposed by expert clinicians to the French generated controlled vocabulary and to finalize the linguistic labels of the coding system in relation with the meanings of the conceptual system structure.
Double Compton and Cyclo-Synchrotron in Super-Eddington Discs, Magnetized Coronae, and Jets
NASA Astrophysics Data System (ADS)
McKinney, Jonathan C.; Chluba, Jens; Wielgus, Maciek; Narayan, Ramesh; Sadowski, Aleksander
2017-05-01
Black hole accretion discs accreting near the Eddington rate are dominated by bremsstrahlung cooling, but above the Eddington rate, the double Compton process can dominate in radiation-dominated regions, while the cyclo-synchrotron can dominate in strongly magnetized regions like a corona or a jet. We present an extension to the general relativistic radiation magnetohydrodynamic code harmrad to account for emission and absorption by thermal cyclo-synchrotron, double Compton, bremsstrahlung, low-temperature opal opacities, as well as Thomson and Compton scattering. The harmrad code and associated analysis and visualization codes have been made open-source and are publicly available at the github repository website. We approximate the radiation field as a Bose-Einstein distribution and evolve it using the radiation number-energy-momentum conservation equations in order to track photon hardening. We perform various simulations to study how these extensions affect the radiative properties of magnetically arrested discs accreting at Eddington to super-Eddington rates. We find that double Compton dominates bremsstrahlung in the disc within a radius of r ˜ 15rg (gravitational radii) at hundred times the Eddington accretion rate, and within smaller radii at lower accretion rates. Double Compton and cyclo-synchrotron regulate radiation and gas temperatures in the corona, while cyclo-synchrotron regulates temperatures in the jet. Interestingly, as the accretion rate drops to Eddington, an optically thin corona develops whose gas temperature of T ˜ 109K is ˜100 times higher than the disc's blackbody temperature. Our results show the importance of double Compton and synchrotron in super-Eddington discs, magnetized coronae and jets.
Characterize Framework for Igneous Activity at Yucca Mountain, Nevada
DOE Office of Scientific and Technical Information (OSTI.GOV)
F. Perry; B. Youngs
2000-11-06
The purpose of this Analysis/Model (AMR) report is twofold. (1) The first is to present a conceptual framework of igneous activity in the Yucca Mountain region (YMR) consistent with the volcanic and tectonic history of this region and the assessment of this history by experts who participated in the Probabilistic Volcanic Hazard Analysis (PVHA) (CRWMS M&O 1996). Conceptual models presented in the PVHA are summarized and extended in areas in which new information has been presented. Alternative conceptual models are discussed as well as their impact on probability models. The relationship between volcanic source zones defined in the PVHA andmore » structural features of the YMR are described based on discussions in the PVHA and studies presented since the PVHA. (2) The second purpose of the AMR is to present probability calculations based on PVHA outputs. Probability distributions are presented for the length and orientation of volcanic dikes within the repository footprint and for the number of eruptive centers located within the repository footprint (conditional on the dike intersecting the repository). The probability of intersection of a basaltic dike within the repository footprint was calculated in the AMR ''Characterize Framework for Igneous Activity at Yucca Mountain, Nevada'' (CRWMS M&O 2000g) based on the repository footprint known as the Enhanced Design Alternative [EDA II, Design B (CRWMS M&O 1999a; Wilkins and Heath 1999)]. Then, the ''Site Recommendation Design Baseline'' (CRWMS M&O 2000a) initiated a change in the repository design, which is described in the ''Site Recommendation Subsurface Layout'' (CRWMS M&O 2000b). Consequently, the probability of intersection of a basaltic dike within the repository footprint has also been calculated for the current repository footprint, which is called the 70,000 Metric Tons of Uranium (MTU) No-Backfill Layout (CRWMS M&O 2000b). The calculations for both footprints are presented in this AMR. In addition, the probability of an eruptive center(s) forming within the repository footprint is calculated and presented in this AMR for both repository footprint designs. This latter type of calculation was not included in the PVHA.« less
Chełkowski, Tadeusz; Gloor, Peter; Jemielniak, Dariusz
2016-01-01
While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that has been gathered in the publicly available open-source software repositories and mailing-list archives offers an opportunity to analyze projects structures and participant involvement. In this article, using on commits data from 263 Apache projects repositories (nearly all), we show that although OSS development is often described as collaborative, but it in fact predominantly relies on radically solitary input and individual, non-collaborative contributions. We also show, in the first published study of this magnitude, that the engagement of contributors is based on a power-law distribution.
2016-01-01
While researchers are becoming increasingly interested in studying OSS phenomenon, there is still a small number of studies analyzing larger samples of projects investigating the structure of activities among OSS developers. The significant amount of information that has been gathered in the publicly available open-source software repositories and mailing-list archives offers an opportunity to analyze projects structures and participant involvement. In this article, using on commits data from 263 Apache projects repositories (nearly all), we show that although OSS development is often described as collaborative, but it in fact predominantly relies on radically solitary input and individual, non-collaborative contributions. We also show, in the first published study of this magnitude, that the engagement of contributors is based on a power-law distribution. PMID:27096157
Syndrome source coding and its universal generalization
NASA Technical Reports Server (NTRS)
Ancheta, T. C., Jr.
1975-01-01
A method of using error-correcting codes to obtain data compression, called syndrome-source-coding, is described in which the source sequence is treated as an error pattern whose syndrome forms the compressed data. It is shown that syndrome-source-coding can achieve arbitrarily small distortion with the number of compressed digits per source digit arbitrarily close to the entropy of a binary memoryless source. A universal generalization of syndrome-source-coding is formulated which provides robustly-effective, distortionless, coding of source ensembles.
NASA Astrophysics Data System (ADS)
Keane, C. M.; Tahirkheli, S.
2017-12-01
Data repositories, especially in the geosciences, have been focused on the management of large quantities of born-digital data and facilitating its discovery and use. Unfortunately, born-digital data, even with its immense scale today, represents only the most recent data acquisitions, leaving a large proportion of the historical data record of the science "out in the cold." Additionally, the data record in the peer-reviewed literature, whether captured directly in the literature or through the journal data archive, represents only a fraction of the reliable data collected in the geosciences. Federal and state agencies, state surveys, and private companies, collect vast amounts of geoscience information and data that is not only reliable and robust, but often the only data representative of specific spatial and temporal conditions. Likewise, even some academic publications, such as senior theses, are unique sources of data, but generally do not have wide discoverability nor guarantees of longevity. As more of these `grey' sources of information and data are born-digital, they become increasingly at risk for permanent loss, not to mention poor discoverability. Numerous studies have shown that grey literature across all disciplines, including geosciences, disappears at a rate of about 8% per year. AGI has been working to develop systems to both improve the discoverability and the preservation of the geoscience grey literature by coupling several open source platforms from the information science community. We will detail the rationale, the technical and legal frameworks for these systems, and the long-term strategies for improving access, use, and stability of these critical data sources.
Methods for Probabilistic Radiological Dose Assessment at a High-Level Radioactive Waste Repository.
NASA Astrophysics Data System (ADS)
Maheras, Steven James
Methods were developed to assess and evaluate the uncertainty in offsite and onsite radiological dose at a high-level radioactive waste repository to show reasonable assurance that compliance with applicable regulatory requirements will be achieved. Uncertainty in offsite dose was assessed by employing a stochastic precode in conjunction with Monte Carlo simulation using an offsite radiological dose assessment code. Uncertainty in onsite dose was assessed by employing a discrete-event simulation model of repository operations in conjunction with an occupational radiological dose assessment model. Complementary cumulative distribution functions of offsite and onsite dose were used to illustrate reasonable assurance. Offsite dose analyses were performed for iodine -129, cesium-137, strontium-90, and plutonium-239. Complementary cumulative distribution functions of offsite dose were constructed; offsite dose was lognormally distributed with a two order of magnitude range. However, plutonium-239 results were not lognormally distributed and exhibited less than one order of magnitude range. Onsite dose analyses were performed for the preliminary inspection, receiving and handling, and the underground areas of the repository. Complementary cumulative distribution functions of onsite dose were constructed and exhibited less than one order of magnitude range. A preliminary sensitivity analysis of the receiving and handling areas was conducted using a regression metamodel. Sensitivity coefficients and partial correlation coefficients were used as measures of sensitivity. Model output was most sensitive to parameters related to cask handling operations. Model output showed little sensitivity to parameters related to cask inspections.
XNAT Central: Open sourcing imaging research data.
Herrick, Rick; Horton, William; Olsen, Timothy; McKay, Michael; Archie, Kevin A; Marcus, Daniel S
2016-01-01
XNAT Central is a publicly accessible medical imaging data repository based on the XNAT open-source imaging informatics platform. It hosts a wide variety of research imaging data sets. The primary motivation for creating XNAT Central was to provide a central repository to host and provide access to a wide variety of neuroimaging data. In this capacity, XNAT Central hosts a number of data sets from research labs and investigative efforts from around the world, including the OASIS Brains imaging studies, the NUSDAST study of schizophrenia, and more. Over time, XNAT Central has expanded to include imaging data from many different fields of research, including oncology, orthopedics, cardiology, and animal studies, but continues to emphasize neuroimaging data. Through the use of XNAT's DICOM metadata extraction capabilities, XNAT Central provides a searchable repository of imaging data that can be referenced by groups, labs, or individuals working in many different areas of research. The future development of XNAT Central will be geared towards greater ease of use as a reference library of heterogeneous neuroimaging data and associated synthetic data. It will also become a tool for making data available supporting published research and academic articles. Copyright © 2015 Elsevier Inc. All rights reserved.
Seven [Data] Habits of Highly Successful Researchers
NASA Astrophysics Data System (ADS)
Kinkade, D.; Shepherd, A.; Saito, M. A.; Wiebe, P. H.; Ake, H.; Biddle, M.; Copley, N. J.; Rauch, S.; Switzer, M. E.; York, A.
2017-12-01
Navigating the landscape of open science and data sharing can be daunting for the long-tail scientist. From satisfying funder requirements, and ensuring proper attribution for their work, to determining the best repository for data management and archive, there are several facets to be considered. Yet, there is no single source of guidance for investigators who may be using multiple research funding models. What role can existing repositories play to help facilitate a more effective data sharing workflow? The Biological and Chemical Oceanographic Data Management Office (BCO-DMO) is a domain-specific repository occupying the niche between funder and investigator. The office works closely with its stakeholders to develop and provide guidance, services, and tools that assist researchers in meeting their data sharing needs. From determining if BCO-DMO is the appropriate repository to manage an investigator's project data, to ensuring that investigator is able to fulfill funder requirements. The goal is to relieve the investigator of the more difficult aspects of data management and data sharing, while simultaneously educating them in better data management practices that will streamline the process of conducting open research in the future. This presentation will provide an overview of the BCO-DMO repository, highlighting some of the services and guidance the office provides to its community.
Do We Really Know how Much it Costs to Construct High Performance Buildings?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Livingston, Olga V.; Dillon, Heather E.; Halverson, Mark A.
2012-08-31
Understanding the cost of energy efficient construction is critical to decision makers in building design, code development, and energy analysis. How much does it cost to upgrade from R-13 to R-19 in a building wall? How much do low-e windows really cost? Can we put a dollar figure on commissioning? Answers to these questions have a fuzzy nature, based on educated guesses and industry lore. The response depends on location, perspective, bulk buying, and hand waving. This paper explores the development of a web tool intended to serve as a publicly available repository of building component costs. In 2011 themore » U.S. Department of Energy (DOE) funded the launch of a web tool called the Building Component Cost Community (BC3), dedicated to publishing building component costs from documented sources, actively gathering verifiable cost data from the users, and collecting feedback from a wide range of participants on the quality of the posted cost data. The updated BC3 database, available at http://bc3.pnnl.gov, went live on April 30, 2012. BC3 serves as the ultimate source of the energy-related component costs for DOE’s residential code development activities, including cost-effectiveness analyses. The paper discusses BC3 objectives, structure, functionality and the current content of the database. It aims to facilitate a dialog about the lack of verifiable transparent cost data, as well as introduce a web tool that helps to address the problem. The questions posed above will also be addressed by this paper, but they have to be resolved by the user community by providing feedback and cost data to the BC3 database, thus increasing transparency and removing information asymmetry.« less
NASA Astrophysics Data System (ADS)
McLaughlin, B. D.; Pawloski, A. W.
2015-12-01
Modern development practices require the ability to quickly and easily host an application. Small projects cannot afford to maintain a large staff for infrastructure maintenance. Rapid prototyping fosters innovation. However, maintaining the integrity of data and systems demands care, particularly in a government context. The extensive data holdings that make up much of the value of NASA's EOSDIS (Earth Observing System Data and Information System) are stored in a number of locations, across a wide variety of applications, ranging from small prototypes to large computationally-intensive operational processes.However, it is increasingly difficult for an application to implement the required security controls, perform required registrations and inventory entries, ensure logging, monitoring, patching, and then ensure that all these activities continue for the life of that application, let alone five, or ten, or fifty applications. This process often takes weeks or months to complete and requires expertise in a variety of different domains such as security, systems administration, development, etc.NGAP, the Next Generation Application Platform, is tackling this problem by investigating, automating, and resolving many of the repeatable policy hurdles that a typical application must overcome. This platform provides a relatively simple and straightforward process by which applications can commit source code to a repository and then deploy that source code to a cloud-based infrastructure, all while meeting NASA's policies for security, governance, inventory, reliability, and availability. While there is still work for the application owner for any application hosting, NGAP handles a significant portion of that work.This talk will discuss areas where we have made significant progress, areas that are complex or must remain human-intensive, and areas where we are still striving to improve this application deployment and hosting pipeline.
NASA Astrophysics Data System (ADS)
Rakowsky, N.; Harig, S.; Androsov, A.; Fuchs, A.; Immerz, A.; Schröter, J.; Hiller, W.
2012-04-01
Starting in 2005, the GITEWS project (German-Indonesian Tsunami Early Warning System) established from scratch a fully operational tsunami warning system at BMKG in Jakarta. Numerical simulations of prototypic tsunami scenarios play a decisive role in a priori risk assessment for coastal regions and in the early warning process itself. Repositories with currently 3470 regional tsunami scenarios for GITEWS and 1780 Indian Ocean wide scenarios in support of Indonesia as a Regional Tsunami Service Provider (RTSP) were computed with the non-linear shallow water modell TsunAWI. It is based on a finite element discretisation, employs unstructured grids with high resolution along the coast and includes inundation. This contribution gives an overview on the model itself, the enhancement of the model physics, and the experiences gained during the process of establishing an operational code suited for thousands of model runs. Technical aspects like computation time, disk space needed for each scenario in the repository, or post processing techniques have a much larger impact than they had in the beginning when TsunAWI started as a research code. Of course, careful testing on artificial benchmarks and real events remains essential, but furthermore, quality control for the large number of scenarios becomes an important issue.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
Some of the major technical questions associated with the burial of radioactive high-level wastes in geologic formations are related to the thermal environments generated by the waste and the impact of this dissipated heat on the surrounding environment. The design of a high level waste storage facility must be such that the temperature variations that occur do not adversely affect operating personnel and equipment. The objective of this investigation was to assist OWI by determining the thermal environment that would be experienced by personnel and equipment in a waste storage facility in salt. Particular emphasis was placed on determining themore » maximum floor and air temperatures with and without ventilation in the first 30 years after waste emplacement. The assumed facility design differs somewhat from those previously analyzed and reported, but many of the previous parametric surveys are useful for comparison. In this investigation a number of 2-dimensional and 3-dimensional simulations of the heat flow in a repository have been performed on the HEATING5 and TRUMP heat transfer codes. The representative repository constructs used in the simulations are described, as well as the computational models and computer codes. Results of the simulations are presented and discussed. Comparisons are made between the recent results and those from previous analyses. Finally, a summary of study limitations, comparisons, and conclusions is given.« less
Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.
2015-01-01
16S rRNA sequences of morphologically and biochemically identified 21 thermophilic bacteria isolated from Unkeshwar hot springs (19°85′N and 78°25′E), Dist. Nanded (India) has been deposited in NCBI repository. The 16S rRNA gene sequences were used to generate QR codes for sequences (FASTA format and full Gene Bank information). Diversity among the isolates is compared with known isolates and evaluated using CGR, FCGR and PCA i.e. visual comparison and evaluation respectively. Considerable biodiversity was observed among the identified bacteria isolated from Unkeshwar hot springs. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. PMID:26793757
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lao, Lang L.; St John, Holger; Staebler, Gary M.
This report describes the work done under U.S. Department of Energy grant number DE-FG02-07ER54935 for the period ending July 31, 2010. The goal of this project was to provide predictive transport analysis to the PTRANSP code. Our contribution to this effort consisted of three parts: (a) a predictive solver suitable for use with highly non-linear transport models and installation of the turbulent confinement models GLF23 and TGLF, (b) an interface of this solver with the PTRANSP code, and (c) initial development of an EPED1 edge pedestal model interface with PTRANSP. PTRANSP has been installed locally on this cluster by importingmore » a complete PTRANSP build environment that always contains the proper version of the libraries and other object files that PTRANSP requires. The GCNMP package and its interface code have been added to the SVN repository at PPPL.« less
NASA Astrophysics Data System (ADS)
Pohlmann, K. F.; Zhu, J.; Ye, M.; Carroll, R. W.; Chapman, J. B.; Russell, C. E.; Shafer, D. S.
2006-12-01
Yucca Mountain (YM), Nevada has been recommended as a deep geological repository for the disposal of spent fuel and high-level radioactive waste. If YM is licensed as a repository by the Nuclear Regulatory Commission, it will be important to identify the potential for radionuclides to migrate from underground nuclear testing areas located on the Nevada Test Site (NTS) to the hydraulically downgradient repository area to ensure that monitoring does not incorrectly attribute repository failure to radionuclides originating from other sources. In this study, we use the Death Valley Regional Flow System (DVRFS) model developed by the U.S. Geological Survey to investigate potential groundwater migration pathways and associated travel times from the NTS to the proposed YM repository area. Using results from the calibrated DVRFS model and the particle tracking post-processing package MODPATH we modeled three-dimensional groundwater advective pathways in the NTS and YM region. Our study focuses on evaluating the potential for groundwater pathways between the NTS and YM withdrawal area and whether travel times for advective flow along these pathways coincide with the prospective monitoring time frame at the proposed repository. We include uncertainty in effective porosity as this is a critical variable in the determination of time for radionuclides to travel from the NTS region to the YM withdrawal area. Uncertainty in porosity is quantified through evaluation of existing site data and expert judgment and is incorporated in the model through Monte Carlo simulation. Since porosity information is limited for this region, the uncertainty is quite large and this is reflected in the results as a large range in simulated groundwater travel times.
DOE Office of Scientific and Technical Information (OSTI.GOV)
F. Perry; R. Youngs
The purpose of this scientific analysis report is threefold: (1) Present a conceptual framework of igneous activity in the Yucca Mountain region (YMR) consistent with the volcanic and tectonic history of this region and the assessment of this history by experts who participated in the probabilistic volcanic hazard analysis (PVHA) (CRWMS M&O 1996 [DIRS 100116]). Conceptual models presented in the PVHA are summarized and applied in areas in which new information has been presented. Alternative conceptual models are discussed, as well as their impact on probability models. The relationship between volcanic source zones defined in the PVHA and structural featuresmore » of the YMR are described based on discussions in the PVHA and studies presented since the PVHA. (2) Present revised probability calculations based on PVHA outputs for a repository footprint proposed in 2003 (BSC 2003 [DIRS 162289]), rather than the footprint used at the time of the PVHA. This analysis report also calculates the probability of an eruptive center(s) forming within the repository footprint using information developed in the PVHA. Probability distributions are presented for the length and orientation of volcanic dikes located within the repository footprint and for the number of eruptive centers (conditional on a dike intersecting the repository) located within the repository footprint. (3) Document sensitivity studies that analyze how the presence of potentially buried basaltic volcanoes may affect the computed frequency of intersection of the repository footprint by a basaltic dike. These sensitivity studies are prompted by aeromagnetic data collected in 1999, indicating the possible presence of previously unrecognized buried volcanoes in the YMR (Blakely et al. 2000 [DIRS 151881]; O'Leary et al. 2002 [DIRS 158468]). The results of the sensitivity studies are for informational purposes only and are not to be used for purposes of assessing repository performance.« less
Naessens, James M; Visscher, Sue L; Peterson, Stephanie M; Swanson, Kristi M; Johnson, Matthew G; Rahman, Parvez A; Schindler, Joe; Sonneborn, Mark; Fry, Donald E; Pine, Michael
2015-08-01
Assess algorithms for linking patients across de-identified databases without compromising confidentiality. Hospital discharges from 11 Mayo Clinic hospitals during January 2008-September 2012 (assessment and validation data). Minnesota death certificates and hospital discharges from 2009 to 2012 for entire state (application data). Cross-sectional assessment of sensitivity and positive predictive value (PPV) for four linking algorithms tested by identifying readmissions and posthospital mortality on the assessment data with application to statewide data. De-identified claims included patient gender, birthdate, and zip code. Assessment records were matched with institutional sources containing unique identifiers and the last four digits of Social Security number (SSNL4). Gender, birthdate, and five-digit zip code identified readmissions with a sensitivity of 98.0 percent and a PPV of 97.7 percent and identified postdischarge mortality with 84.4 percent sensitivity and 98.9 percent PPV. Inclusion of SSNL4 produced nearly perfect identification of readmissions and deaths. When applied statewide, regions bordering states with unavailable hospital discharge data had lower rates. Addition of SSNL4 to administrative data, accompanied by appropriate data use and data release policies, can enable trusted repositories to link data with nearly perfect accuracy without compromising patient confidentiality. States maintaining centralized de-identified databases should add SSNL4 to data specifications. © Health Research and Educational Trust.
NASA Astrophysics Data System (ADS)
Strasser, C.; Borda, S.; Cruse, P.; Kunze, J.
2013-12-01
There are many barriers to data management and sharing among earth and environmental scientists; among the most significant are a lack of knowledge about best practices for data management, metadata standards, or appropriate data repositories for archiving and sharing data. Last year we developed an open source web application, DataUp, to help researchers overcome these barriers. DataUp helps scientists to (1) determine whether their file is CSV compatible, (2) generate metadata in a standard format, (3) retrieve an identifier to facilitate data citation, and (4) deposit their data into a repository. With funding from the NSF via a supplemental grant to the DataONE project, we are working to improve upon DataUp. Our main goal for DataUp 2.0 is to ensure organizations and repositories are able to adopt and adapt DataUp to meet their unique needs, including connecting to analytical tools, adding new metadata schema, and expanding the list of connected data repositories. DataUp is a collaborative project between the California Digital Library, DataONE, the San Diego Supercomputing Center, and Microsoft Research Connections.
gemcWeb: A Cloud Based Nuclear Physics Simulation Software
NASA Astrophysics Data System (ADS)
Markelon, Sam
2017-09-01
gemcWeb allows users to run nuclear physics simulations from the web. Being completely device agnostic, scientists can run simulations from anywhere with an Internet connection. Having a full user system, gemcWeb allows users to revisit and revise their projects, and share configurations and results with collaborators. gemcWeb is based on simulation software gemc, which is based on standard GEant4. gemcWeb requires no C++, gemc, or GEant4 knowledge. Using a simple but powerful GUI allows users to configure their project from geometries and configurations stored on the deployment server. Simulations are then run on the server, with results being posted to the user, and then securely stored. Python based and open-source, the main version of gemcWeb is hosted internally at Jefferson National Labratory and used by the CLAS12 and Electron-Ion Collider Project groups. However, as the software is open-source, and hosted as a GitHub repository, an instance can be deployed on the open web, or any institution's intra-net. An instance can be configured to host experiments specific to an institution, and the code base can be modified by any individual or group. Special thanks to: Maurizio Ungaro, PhD., creator of gemc; Markus Diefenthaler, PhD., advisor; and Kyungseon Joo, PhD., advisor.
Muir, Dylan R; Kampa, Björn M
2014-01-01
Two-photon calcium imaging of neuronal responses is an increasingly accessible technology for probing population responses in cortex at single cell resolution, and with reasonable and improving temporal resolution. However, analysis of two-photon data is usually performed using ad-hoc solutions. To date, no publicly available software exists for straightforward analysis of stimulus-triggered two-photon imaging experiments. In addition, the increasing data rates of two-photon acquisition systems imply increasing cost of computing hardware required for in-memory analysis. Here we present a Matlab toolbox, FocusStack, for simple and efficient analysis of two-photon calcium imaging stacks on consumer-level hardware, with minimal memory footprint. We also present a Matlab toolbox, StimServer, for generation and sequencing of visual stimuli, designed to be triggered over a network link from a two-photon acquisition system. FocusStack is compatible out of the box with several existing two-photon acquisition systems, and is simple to adapt to arbitrary binary file formats. Analysis tools such as stack alignment for movement correction, automated cell detection and peri-stimulus time histograms are already provided, and further tools can be easily incorporated. Both packages are available as publicly-accessible source-code repositories.
Muir, Dylan R.; Kampa, Björn M.
2015-01-01
Two-photon calcium imaging of neuronal responses is an increasingly accessible technology for probing population responses in cortex at single cell resolution, and with reasonable and improving temporal resolution. However, analysis of two-photon data is usually performed using ad-hoc solutions. To date, no publicly available software exists for straightforward analysis of stimulus-triggered two-photon imaging experiments. In addition, the increasing data rates of two-photon acquisition systems imply increasing cost of computing hardware required for in-memory analysis. Here we present a Matlab toolbox, FocusStack, for simple and efficient analysis of two-photon calcium imaging stacks on consumer-level hardware, with minimal memory footprint. We also present a Matlab toolbox, StimServer, for generation and sequencing of visual stimuli, designed to be triggered over a network link from a two-photon acquisition system. FocusStack is compatible out of the box with several existing two-photon acquisition systems, and is simple to adapt to arbitrary binary file formats. Analysis tools such as stack alignment for movement correction, automated cell detection and peri-stimulus time histograms are already provided, and further tools can be easily incorporated. Both packages are available as publicly-accessible source-code repositories1. PMID:25653614
NASA GSFC Tin Whisker Homepage http://nepp.nasa.gov/whisker
NASA Technical Reports Server (NTRS)
Shaw, Harry
2000-01-01
The NASA GSFC Tin Whisker Homepage provides general information and GSFC Code 562 experimentation results regarding the well known phenomenon of tin whisker formation from pure tin plated substrates. The objective of this www site is to provide a central repository for information pertaining to this phenomenon and to provide status of the GSFC experiments to understand the behavior of tin whiskers in space environments. The Tin Whisker www site is produced by Code 562. This www site does not provide information pertaining to patented or proprietary information. All of the information contained in this www site is at the level of that produced by industry and university researchers and is published at international conferences.
Doing Your Science While You're in Orbit
NASA Astrophysics Data System (ADS)
Green, Mark L.; Miller, Stephen D.; Vazhkudai, Sudharshan S.; Trater, James R.
2010-11-01
Large-scale neutron facilities such as the Spallation Neutron Source (SNS) located at Oak Ridge National Laboratory need easy-to-use access to Department of Energy Leadership Computing Facilities and experiment repository data. The Orbiter thick- and thin-client and its supporting Service Oriented Architecture (SOA) based services (available at https://orbiter.sns.gov) consist of standards-based components that are reusable and extensible for accessing high performance computing, data and computational grid infrastructure, and cluster-based resources easily from a user configurable interface. The primary Orbiter system goals consist of (1) developing infrastructure for the creation and automation of virtual instrumentation experiment optimization, (2) developing user interfaces for thin- and thick-client access, (3) provide a prototype incorporating major instrument simulation packages, and (4) facilitate neutron science community access and collaboration. The secure Orbiter SOA authentication and authorization is achieved through the developed Virtual File System (VFS) services, which use Role-Based Access Control (RBAC) for data repository file access, thin-and thick-client functionality and application access, and computational job workflow management. The VFS Relational Database Management System (RDMS) consists of approximately 45 database tables describing 498 user accounts with 495 groups over 432,000 directories with 904,077 repository files. Over 59 million NeXus file metadata records are associated to the 12,800 unique NeXus file field/class names generated from the 52,824 repository NeXus files. Services that enable (a) summary dashboards of data repository status with Quality of Service (QoS) metrics, (b) data repository NeXus file field/class name full text search capabilities within a Google like interface, (c) fully functional RBAC browser for the read-only data repository and shared areas, (d) user/group defined and shared metadata for data repository files, (e) user, group, repository, and web 2.0 based global positioning with additional service capabilities are currently available. The SNS based Orbiter SOA integration progress with the Distributed Data Analysis for Neutron Scattering Experiments (DANSE) software development project is summarized with an emphasis on DANSE Central Services and the Virtual Neutron Facility (VNF). Additionally, the DANSE utilization of the Orbiter SOA authentication, authorization, and data transfer services best practice implementations are presented.
Knowledge repositories for multiple uses
NASA Technical Reports Server (NTRS)
Williamson, Keith; Riddle, Patricia
1991-01-01
In the life cycle of a complex physical device or part, for example, the docking bay door of the Space Station, there are many uses for knowledge about the device or part. The same piece of knowledge might serve several uses. Given the quantity and complexity of the knowledge that must be stored, it is critical to maintain the knowledge in one repository, in one form. At the same time, because of quantity and complexity of knowledge that must be used in life cycle applications such as cost estimation, re-design, and diagnosis, it is critical to automate such knowledge uses. For each specific use, a knowledge base must be available and must be in a from that promotes the efficient performance of that knowledge base. However, without a single source knowledge repository, the cost of maintaining consistent knowledge between multiple knowledge bases increases dramatically; as facts and descriptions change, they must be updated in each individual knowledge base. A use-neutral representation of a hydraulic system for the F-111 aircraft was developed. The ability to derive portions of four different knowledge bases is demonstrated from this use-neutral representation: one knowledge base is for re-design of the device using a model-based reasoning problem solver; two knowledge bases, at different levels of abstraction, are for diagnosis using a model-based reasoning solver; and one knowledge base is for diagnosis using an associational reasoning problem solver. It was shown how updates issued against the single source use-neutral knowledge repository can be propagated to the underlying knowledge bases.
Rekadwad, Bhagwan N.; Khobragade, Chandrahasya N.
2016-01-01
Microbiologists are routinely engaged isolation, identification and comparison of isolated bacteria for their novelty. 16S rRNA sequences of Bacillus pumilus were retrieved from NCBI repository and generated QR codes for sequences (FASTA format and full Gene Bank information). 16SrRNA were used to generate quick response (QR) codes of Bacillus pumilus isolated from Lonar Crator Lake (19° 58′ N; 76° 31′ E), India. Bacillus pumilus 16S rRNA gene sequences were used to generate CGR, FCGR and PCA. These can be used for visual comparison and evaluation respectively. The hyperlinked QR codes, CGR, FCGR and PCA of all the isolates are made available to the users on a portal https://sites.google.com/site/bhagwanrekadwad/. This generated digital data helps to evaluate and compare any Bacillus pumilus strain, minimizes laboratory efforts and avoid misinterpretation of the species. PMID:27141529
The ZPIC educational code suite
NASA Astrophysics Data System (ADS)
Calado, R.; Pardal, M.; Ninhos, P.; Helm, A.; Mori, W. B.; Decyk, V. K.; Vieira, J.; Silva, L. O.; Fonseca, R. A.
2017-10-01
Particle-in-Cell (PIC) codes are used in almost all areas of plasma physics, such as fusion energy research, plasma accelerators, space physics, ion propulsion, and plasma processing, and many other areas. In this work, we present the ZPIC educational code suite, a new initiative to foster training in plasma physics using computer simulations. Leveraging on our expertise and experience from the development and use of the OSIRIS PIC code, we have developed a suite of 1D/2D fully relativistic electromagnetic PIC codes, as well as 1D electrostatic. These codes are self-contained and require only a standard laptop/desktop computer with a C compiler to be run. The output files are written in a new file format called ZDF that can be easily read using the supplied routines in a number of languages, such as Python, and IDL. The code suite also includes a number of example problems that can be used to illustrate several textbook and advanced plasma mechanisms, including instructions for parameter space exploration. We also invite contributions to this repository of test problems that will be made freely available to the community provided the input files comply with the format defined by the ZPIC team. The code suite is freely available and hosted on GitHub at https://github.com/zambzamb/zpic. Work partially supported by PICKSC.
BigMouth: a multi-institutional dental data repository
Walji, Muhammad F; Kalenderian, Elsbeth; Stark, Paul C; White, Joel M; Kookal, Krishna K; Phan, Dat; Tran, Duong; Bernstam, Elmer V; Ramoni, Rachel
2014-01-01
Few oral health databases are available for research and the advancement of evidence-based dentistry. In this work we developed a centralized data repository derived from electronic health records (EHRs) at four dental schools participating in the Consortium of Oral Health Research and Informatics. A multi-stakeholder committee developed a data governance framework that encouraged data sharing while allowing control of contributed data. We adopted the i2b2 data warehousing platform and mapped data from each institution to a common reference terminology. We realized that dental EHRs urgently need to adopt common terminologies. While all used the same treatment code set, only three of the four sites used a common diagnostic terminology, and there were wide discrepancies in how medical and dental histories were documented. BigMouth was successfully launched in August 2012 with data on 1.1 million patients, and made available to users at the contributing institutions. PMID:24993547
Piloting a Deceased Subject Integrated Data Repository and Protecting Privacy of Relatives
Huser, Vojtech; Kayaalp, Mehmet; Dodd, Zeyno A.; Cimino, James J.
2014-01-01
Use of deceased subject Electronic Health Records can be an important piloting platform for informatics or biomedical research. Existing legal framework allows such research under less strict de-identification criteria; however, privacy of non-decedent must be protected. We report on creation of the decease subject Integrated Data Repository (dsIDR) at National Institutes of Health, Clinical Center and a pilot methodology to remove secondary protected health information or identifiable information (secondary PxI; information about persons other than the primary patient). We characterize available structured coded data in dsIDR and report the estimated frequencies of secondary PxI, ranging from 12.9% (sensitive token presence) to 1.1% (using stricter criteria). Federating decedent EHR data from multiple institutions can address sample size limitations and our pilot study provides lessons learned and methodology that can be adopted by other institutions. PMID:25954378
Piloting a deceased subject integrated data repository and protecting privacy of relatives.
Huser, Vojtech; Kayaalp, Mehmet; Dodd, Zeyno A; Cimino, James J
2014-01-01
Use of deceased subject Electronic Health Records can be an important piloting platform for informatics or biomedical research. Existing legal framework allows such research under less strict de-identification criteria; however, privacy of non-decedent must be protected. We report on creation of the decease subject Integrated Data Repository (dsIDR) at National Institutes of Health, Clinical Center and a pilot methodology to remove secondary protected health information or identifiable information (secondary PxI; information about persons other than the primary patient). We characterize available structured coded data in dsIDR and report the estimated frequencies of secondary PxI, ranging from 12.9% (sensitive token presence) to 1.1% (using stricter criteria). Federating decedent EHR data from multiple institutions can address sample size limitations and our pilot study provides lessons learned and methodology that can be adopted by other institutions.
Managing Digital Archives Using Open Source Software Tools
NASA Astrophysics Data System (ADS)
Barve, S.; Dongare, S.
2007-10-01
This paper describes the use of open source software tools such as MySQL and PHP for creating database-backed websites. Such websites offer many advantages over ones built from static HTML pages. This paper will discuss how OSS tools are used and their benefits, and after the successful implementation of these tools how the library took the initiative in implementing an institutional repository using DSpace open source software.
NASA Astrophysics Data System (ADS)
De Vecchi, Daniele; Dell'Acqua, Fabio
2016-04-01
The EU FP7 MARSITE project aims at assessing the "state of the art" of seismic risk evaluation and management at European level, as a starting point to move a "step forward" towards new concepts of risk mitigation and management by long-term monitoring activities carried out both on land and at sea. Spaceborne Earth Observation (EO) is one of the means through which MARSITE is accomplishing this commitment, whose importance is growing as a consequence of the operational unfolding of the Copernicus initiative. Sentinel-2 data, with its open-data policy, represents an unprecedented opportunity to access global spaceborne multispectral data for various purposes including risk monitoring. In the framework of EU FP7 projects MARSITE, RASOR and SENSUM, our group has developed a suite of geospatial software tools to automatically extract risk-related features from EO data, especially on the exposure and vulnerability side of the "risk equation" [1]. These are for example the extension of a built-up area or the distribution of building density. These tools are available open-source as QGIS plug-ins [2] and their source code can be freely downloaded from GitHub [3]. A test case on the risk-prone mega city of Istanbul has been set up, and preliminary results will be presented in this paper. The output of the algorithms can be incorporated into a risk modeling process, whose output is very useful to stakeholders and decision makers who intend to assess and mitigate the risk level across the giant urban agglomerate. Keywords - Remote Sensing, Copernicus, Istanbul megacity, seismic risk, multi-risk, exposure, open-source References [1] Harb, M.M.; De Vecchi, D.; Dell'Acqua, F., "Physical Vulnerability Proxies from Remotes Sensing: Reviewing, Implementing and Disseminating Selected Techniques," Geoscience and Remote Sensing Magazine, IEEE , vol.3, no.1, pp.20,33, March 2015. doi: 10.1109/MGRS.2015.2398672 [2] SENSUM QGIS plugin, 2016, available online at: https://plugins.qgis.org/plugins/sensum_eo_tools/ [3] SENSUM QGIS code repository, 2016, available online at: https://github.com/SENSUM-project/sensum_rs_qgis
DNA Repair and Ethnic Differences in Prostate Cancer Risk
2006-03-01
Georgetown University for processing . Each sample is centrifuged and the blood components are separated into serum, clot, buffy coat, and plasma...within 4 hours of reception. The processed , aliquoted, and bar-coded samples are stored in a repository at GUH at -80oC. The slow growth of prostate...completeness. Daily backups are performed to protect data against accidental destruction or corruption. Blood samples are processed within 24 hours of sample
Method and system of integrating information from multiple sources
Alford, Francine A [Livermore, CA; Brinkerhoff, David L [Antioch, CA
2006-08-15
A system and method of integrating information from multiple sources in a document centric application system. A plurality of application systems are connected through an object request broker to a central repository. The information may then be posted on a webpage. An example of an implementation of the method and system is an online procurement system.
Simonaitis, Linas; McDonald, Clement J
2009-10-01
The utility of National Drug Codes (NDCs) and drug knowledge bases (DKBs) in the organization of prescription records from multiple sources was studied. The master files of most pharmacy systems include NDCs and local codes to identify the products they dispense. We obtained a large sample of prescription records from seven different sources. These records carried a national product code or a local code that could be translated into a national product code via their formulary master. We obtained mapping tables from five DKBs. We measured the degree to which the DKB mapping tables covered the national product codes carried in or associated with the sample of prescription records. Considering the total prescription volume, DKBs covered 93.0-99.8% of the product codes from three outpatient sources and 77.4-97.0% of the product codes from four inpatient sources. Among the in-patient sources, invented codes explained 36-94% of the noncoverage. Outpatient pharmacy sources rarely invented codes, which comprised only 0.11-0.21% of their total prescription volume, compared with inpatient pharmacy sources for which invented codes comprised 1.7-7.4% of their prescription volume. The distribution of prescribed products was highly skewed, with 1.4-4.4% of codes accounting for 50% of the message volume and 10.7-34.5% accounting for 90% of the message volume. DKBs cover the product codes used by outpatient sources sufficiently well to permit automatic mapping. Changes in policies and standards could increase coverage of product codes used by inpatient sources.
Practices in Code Discoverability: Astrophysics Source Code Library
NASA Astrophysics Data System (ADS)
Allen, A.; Teuben, P.; Nemiroff, R. J.; Shamir, L.
2012-09-01
Here we describe the Astrophysics Source Code Library (ASCL), which takes an active approach to sharing astrophysics source code. ASCL's editor seeks out both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and adds entries for the found codes to the library. This approach ensures that source codes are added without requiring authors to actively submit them, resulting in a comprehensive listing that covers a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL now has over 340 codes in it and continues to grow. In 2011, the ASCL has on average added 19 codes per month. An advisory committee has been established to provide input and guide the development and expansion of the new site, and a marketing plan has been developed and is being executed. All ASCL source codes have been used to generate results published in or submitted to a refereed journal and are freely available either via a download site or from an identified source. This paper provides the history and description of the ASCL. It lists the requirements for including codes, examines the advantages of the ASCL, and outlines some of its future plans.
Bio-repository of post-clinical test samples at the national cancer center hospital (NCCH) in Tokyo.
Furuta, Koh; Yokozawa, Karin; Takada, Takako; Kato, Hoichi
2009-08-01
We established the Bio-repository at the National Cancer Center Hospital in October 2002. The main purpose of this article is to show the importance and usefulness of a bio-repository of post-clinical test samples not only for translational cancer research but also for routine clinical oncology by introducing the experience of setting up such a facility. Our basic concept of a post-clinical test sample is not as left-over waste, but rather as frozen evidence of a patient's pathological condition at a particular point. We can decode, if not all, most of the laboratory data from a post-clinical test sample. As a result, the bio-repository is able to provide not only the samples, but potentially all related laboratory data upon request. The areas of sample coverage are the following: sera after routine blood tests; sera after cross-match tests for transfusion; serum or plasma submitted at a patient's clinically important time period by the physician; and samples collected by the individual investigator. The formats of stored samples are plasma or serum, dried blood spot (DBS) and buffy coat. So far, 150 218 plasmas or sera, 35 253 DBS and 536 buffy coats have been registered for our bio-repository system. We arranged to provide samples to various concerned parties under strict legal and ethical agreements. Although the number of the utilized samples was initially limited, the inquiries for sample utilization are now increasing steadily from both research and clinical sources. Further efforts to increase the benefits of the repository are intended.
Input Files and Procedures for Analysis of SMA Hybrid Composite Beams in MSC.Nastran and ABAQUS
NASA Technical Reports Server (NTRS)
Turner, Travis L.; Patel, Hemant D.
2005-01-01
A thermoelastic constitutive model for shape memory alloys (SMAs) and SMA hybrid composites (SMAHCs) was recently implemented in the commercial codes MSC.Nastran and ABAQUS. The model is implemented and supported within the core of the commercial codes, so no user subroutines or external calculations are necessary. The model and resulting structural analysis has been previously demonstrated and experimentally verified for thermoelastic, vibration and acoustic, and structural shape control applications. The commercial implementations are described in related documents cited in the references, where various results are also shown that validate the commercial implementations relative to a research code. This paper is a companion to those documents in that it provides additional detail on the actual input files and solution procedures and serves as a repository for ASCII text versions of the input files necessary for duplication of the available results.
Neuhaus, Philipp; Doods, Justin; Dugas, Martin
2015-01-01
Automatic coding of medical terms is an important, but highly complicated and laborious task. To compare and evaluate different strategies a framework with a standardized web-interface was created. Two UMLS mapping strategies are compared to demonstrate the interface. The framework is a Java Spring application running on a Tomcat application server. It accepts different parameters and returns results in JSON format. To demonstrate the framework, a list of medical data items was mapped by two different methods: similarity search in a large table of terminology codes versus search in a manually curated repository. These mappings were reviewed by a specialist. The evaluation shows that the framework is flexible (due to standardized interfaces like HTTP and JSON), performant and reliable. Accuracy of automatically assigned codes is limited (up to 40%). Combining different semantic mappers into a standardized Web-API is feasible. This framework can be easily enhanced due to its modular design.
Introduction to geospatial semantics and technology workshop handbook
Varanka, Dalia E.
2012-01-01
The workshop is a tutorial on introductory geospatial semantics with hands-on exercises using standard Web browsers. The workshop is divided into two sections, general semantics on the Web and specific examples of geospatial semantics using data from The National Map of the U.S. Geological Survey and the Open Ontology Repository. The general semantics section includes information and access to publicly available semantic archives. The specific session includes information on geospatial semantics with access to semantically enhanced data for hydrography, transportation, boundaries, and names. The Open Ontology Repository offers open-source ontologies for public use.
OpenCMISS: a multi-physics & multi-scale computational infrastructure for the VPH/Physiome project.
Bradley, Chris; Bowery, Andy; Britten, Randall; Budelmann, Vincent; Camara, Oscar; Christie, Richard; Cookson, Andrew; Frangi, Alejandro F; Gamage, Thiranja Babarenda; Heidlauf, Thomas; Krittian, Sebastian; Ladd, David; Little, Caton; Mithraratne, Kumar; Nash, Martyn; Nickerson, David; Nielsen, Poul; Nordbø, Oyvind; Omholt, Stig; Pashaei, Ali; Paterson, David; Rajagopal, Vijayaraghavan; Reeve, Adam; Röhrle, Oliver; Safaei, Soroush; Sebastián, Rafael; Steghöfer, Martin; Wu, Tim; Yu, Ting; Zhang, Heye; Hunter, Peter
2011-10-01
The VPH/Physiome Project is developing the model encoding standards CellML (cellml.org) and FieldML (fieldml.org) as well as web-accessible model repositories based on these standards (models.physiome.org). Freely available open source computational modelling software is also being developed to solve the partial differential equations described by the models and to visualise results. The OpenCMISS code (opencmiss.org), described here, has been developed by the authors over the last six years to replace the CMISS code that has supported a number of organ system Physiome projects. OpenCMISS is designed to encompass multiple sets of physical equations and to link subcellular and tissue-level biophysical processes into organ-level processes. In the Heart Physiome project, for example, the large deformation mechanics of the myocardial wall need to be coupled to both ventricular flow and embedded coronary flow, and the reaction-diffusion equations that govern the propagation of electrical waves through myocardial tissue need to be coupled with equations that describe the ion channel currents that flow through the cardiac cell membranes. In this paper we discuss the design principles and distributed memory architecture behind the OpenCMISS code. We also discuss the design of the interfaces that link the sets of physical equations across common boundaries (such as fluid-structure coupling), or between spatial fields over the same domain (such as coupled electromechanics), and the concepts behind CellML and FieldML that are embodied in the OpenCMISS data structures. We show how all of these provide a flexible infrastructure for combining models developed across the VPH/Physiome community. Copyright © 2011 Elsevier Ltd. All rights reserved.
mORCA: ubiquitous access to life science web services.
Diaz-Del-Pino, Sergio; Trelles, Oswaldo; Falgueras, Juan
2018-01-16
Technical advances in mobile devices such as smartphones and tablets have produced an extraordinary increase in their use around the world and have become part of our daily lives. The possibility of carrying these devices in a pocket, particularly mobile phones, has enabled ubiquitous access to Internet resources. Furthermore, in the life sciences world there has been a vast proliferation of data types and services that finish as Web Services. This suggests the need for research into mobile clients to deal with life sciences applications for effective usage and exploitation. Analysing the current features in existing bioinformatics applications managing Web Services, we have devised, implemented, and deployed an easy-to-use web-based lightweight mobile client. This client is able to browse, select, compose parameters, invoke, and monitor the execution of Web Services stored in catalogues or central repositories. The client is also able to deal with huge amounts of data between external storage mounts. In addition, we also present a validation use case, which illustrates the usage of the application while executing, monitoring, and exploring the results of a registered workflow. The software its available in the Apple Store and Android Market and the source code is publicly available in Github. Mobile devices are becoming increasingly important in the scientific world due to their strong potential impact on scientific applications. Bioinformatics should not fall behind this trend. We present an original software client that deals with the intrinsic limitations of such devices and propose different guidelines to provide location-independent access to computational resources in bioinformatics and biomedicine. Its modular design makes it easily expandable with the inclusion of new repositories, tools, types of visualization, etc.
Molecular hydrogen: An abundant energy source for bacterial activity in nuclear waste repositories
NASA Astrophysics Data System (ADS)
Libert, M.; Bildstein, O.; Esnault, L.; Jullien, M.; Sellier, R.
A thorough understanding of the energy sources used by microbial systems in the deep terrestrial subsurface is essential since the extreme conditions for life in deep biospheres may serve as a model for possible life in a nuclear waste repository. In this respect, H 2 is known as one of the most energetic substrates for deep terrestrial subsurface environments. This hydrogen is produced from abiotic and biotic processes but its concentration in natural systems is usually maintained at very low levels due to hydrogen-consuming bacteria. A significant amount of H 2 gas will be produced within deep nuclear waste repositories, essentially from the corrosion of metallic components. This will consequently improve the conditions for microbial activity in this specific environment. This paper discusses different study cases with experimental results to illustrate the fact that microorganisms are able to use hydrogen for redox processes (reduction of O 2, NO3-, Fe III) in several waste disposal conditions. Consequences of microbial activity include: alteration of groundwater chemistry and shift in geochemical equilibria, gas production or consumption, biocorrosion, and potential modifications of confinement properties. In order to quantify the impact of hydrogen bacteria, the next step will be to determine the kinetic rate of the reactions in realistic conditions.
NASA Astrophysics Data System (ADS)
Kaláb, Zdeněk; Šílený, Jan; Lednická, Markéta
2017-07-01
This paper deals with the seismic stability of the survey areas of potential sites for the deep geological repository of the spent nuclear fuel in the Czech Republic. The basic source of data for historical earthquakes up to 1990 was the seismic website [1-]. The most intense earthquake described occurred on September 15, 1590 in the Niederroesterreich region (Austria) in the historical period; its reported intensity is Io = 8-9. The source of the contemporary seismic data for the period since 1991 to the end of 2014 was the website [11]. It may be stated based on the databases and literature review that in the period from 1900, no earthquake exceeding magnitude 5.1 originated in the territory of the Czech Republic. In order to evaluate seismicity and to assess the impact of seismic effects at depths of hypothetical deep geological repository for the next time period, the neo-deterministic method was selected as an extension of the probabilistic method. Each one out of the seven survey areas were assessed by the neo-deterministic evaluation of the seismic wave-field excited by selected individual events and determining the maximum loading. Results of seismological databases studies and neo-deterministic analysis of Čihadlo locality are presented.
Facilitating Internet-Scale Code Retrieval
ERIC Educational Resources Information Center
Bajracharya, Sushil Krishna
2010-01-01
Internet-Scale code retrieval deals with the representation, storage, and access of relevant source code from a large amount of source code available on the Internet. Internet-Scale code retrieval systems support common emerging practices among software developers related to finding and reusing source code. In this dissertation we focus on some…
Rock and Core Repository Coming Digital
NASA Astrophysics Data System (ADS)
Maicher, Doris; Fleischer, Dirk; Czerniak, Andreas
2016-04-01
In times of whole city centres being available by a mouse click in 3D to virtually walk through, reality sometimes becomes neglected. The reality of scientific sample collections not being digitised to the essence of molecules, isotopes and electrons becomes unbelievable to the upgrowing generation of scientists. Just like any other geological institute the Helmholtz Centre for Ocean Research GEOMAR accumulated thousands of specimen. The samples, collected mainly during marine expeditions, date back as far as 1964. Today GEOMAR houses a central geological sample collection of at least 17 000 m of sediment core and more than 4 500 boxes with hard rock samples and refined sample specimen. This repository, having been dormant, missed the onset of the interconnected digital age. Physical samples without barcodes, QR codes or RFID tags need to be migrated and reconnected, urgently. In our use case, GEOMAR opted for the International Geo Sample Number IGSN as the persistent identifier. Consequentially, the software CurationDIS by smartcube GmbH as the central component of this project was selected. The software is designed to handle acquisition and administration of sample material and sample archiving in storage places. In addition, the software allows direct embedding of IGSN. We plan to adopt IGSN as a future asset, while for the initial inventory taking of our sample material, simple but unique QR codes act as "bridging identifiers" during the process. Currently we compile an overview of the broad variety of sample types and their associated data. QR-coding of the boxes of rock samples and sediment cores is near completion, delineating their location in the repository and linking a particular sample to any information available about the object. Planning is in progress to streamline the flow from receiving new samples to their curation to sharing samples and information publically. Additionally, interface planning for linkage to GEOMAR databases OceanRep (publications) and OSIS (expeditions) as well as for external data retrieval are in the pipeline. Looking ahead to implement IGSN, taking on board lessons learned from earlier generations, it will enable to comply with our institute's open science policy. Also it will allow to register newly collected samples already during ship expeditions. They thus receive their "birth certificate" contemporarily in this ever faster revolving scientific world.
Burchill, C; Roos, L L; Fergusson, P; Jebamani, L; Turner, K; Dueck, S
2000-01-01
Comprehensive data available in the Canadian province of Manitoba since 1970 have aided study of the interaction between population health, health care utilization, and structural features of the health care system. Given a complex linked database and many ongoing projects, better organization of available epidemiological, institutional, and technical information was needed. The Manitoba Centre for Health Policy and Evaluation wished to develop a knowledge repository to handle data, document research Methods, and facilitate both internal communication and collaboration with other sites. This evolving knowledge repository consists of both public and internal (restricted access) pages on the World Wide Web (WWW). Information can be accessed using an indexed logical format or queried to allow entry at user-defined points. The main topics are: Concept Dictionary, Research Definitions, Meta-Index, and Glossary. The Concept Dictionary operationalizes concepts used in health research using administrative data, outlining the creation of complex variables. Research Definitions specify the codes for common surgical procedures, tests, and diagnoses. The Meta-Index organizes concepts and definitions according to the Medical Sub-Heading (MeSH) system developed by the National Library of Medicine. The Glossary facilitates navigation through the research terms and abbreviations in the knowledge repository. An Education Resources heading presents a web-based graduate course using substantial amounts of material in the Concept Dictionary, a lecture in the Epidemiology Supercourse, and material for Manitoba's Regional Health Authorities. Confidential information (including Data Dictionaries) is available on the Centre's internal website. Use of the public pages has increased dramatically since January 1998, with almost 6,000 page hits from 250 different hosts in May 1999. More recently, the number of page hits has averaged around 4,000 per month, while the number of unique hosts has climbed to around 400. This knowledge repository promotes standardization and increases efficiency by placing concepts and associated programming in the Centre's collective memory. Collaboration and project management are facilitated.
Burchill, Charles; Fergusson, Patricia; Jebamani, Laurel; Turner, Ken; Dueck, Stephen
2000-01-01
Background Comprehensive data available in the Canadian province of Manitoba since 1970 have aided study of the interaction between population health, health care utilization, and structural features of the health care system. Given a complex linked database and many ongoing projects, better organization of available epidemiological, institutional, and technical information was needed. Objective The Manitoba Centre for Health Policy and Evaluation wished to develop a knowledge repository to handle data, document research methods, and facilitate both internal communication and collaboration with other sites. Methods This evolving knowledge repository consists of both public and internal (restricted access) pages on the World Wide Web (WWW). Information can be accessed using an indexed logical format or queried to allow entry at user-defined points. The main topics are: Concept Dictionary, Research Definitions, Meta-Index, and Glossary. The Concept Dictionary operationalizes concepts used in health research using administrative data, outlining the creation of complex variables. Research Definitions specify the codes for common surgical procedures, tests, and diagnoses. The Meta-Index organizes concepts and definitions according to the Medical Sub-Heading (MeSH) system developed by the National Library of Medicine. The Glossary facilitates navigation through the research terms and abbreviations in the knowledge repository. An Education Resources heading presents a web-based graduate course using substantial amounts of material in the Concept Dictionary, a lecture in the Epidemiology Supercourse, and material for Manitoba's Regional Health Authorities. Confidential information (including Data Dictionaries) is available on the Centre's internal website. Results Use of the public pages has increased dramatically since January 1998, with almost 6,000 page hits from 250 different hosts in May 1999. More recently, the number of page hits has averaged around 4,000 per month, while the number of unique hosts has climbed to around 400. Conclusions This knowledge repository promotes standardization and increases efficiency by placing concepts and associated programming in the Centre's collective memory. Collaboration and project management are facilitated. PMID:11720929
Preliminary safety evaluation of an aircraft impact on a near-surface radioactive waste repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo Frano, R.; Forasassi, G.; Pugliese, G.
2013-07-01
The aircraft impact accident has become very significant in the design of a nuclear facilities, particularly, after the tragic September 2001 event, that raised the public concern about the potential damaging effects that the impact of a large civilian airplane could bring in safety relevant structures. The aim of this study is therefore to preliminarily evaluate the global response and the structural effects induced by the impact of a military or commercial airplane (actually considered as a 'beyond design basis' event) into a near surface radioactive waste (RWs) disposal facility. The safety evaluation was carried out according to the Internationalmore » safety and design guidelines and in agreement with the stress tests requirements for the security track. To achieve the purpose, a lay out and a scheme of a possible near surface repository, like for example those of the El Cabril one, were taken into account. In order to preliminarily perform a reliable analysis of such a large-scale structure and to determine the structural effects induced by such a types of impulsive loads, a realistic, but still operable, numerical model with suitable materials characteristics was implemented by means of FEM codes. In the carried out structural analyses, the RWs repository was considered a 'robust' target, due to its thicker walls and main constitutive materials (steel and reinforced concrete). In addition to adequately represent the dynamic response of repository under crashing, relevant physical phenomena (i.e. penetration, spalling, etc.) were simulated and analysed. The preliminary assessment of the effects induced by the dynamic/impulsive loads allowed generally to verify the residual strength capability of the repository considered. The obtained preliminary results highlighted a remarkable potential to withstand the impact of military/large commercial aircraft, even in presence of ongoing concrete progressive failure (some penetration and spalling of the concrete wall) of the impacted area. (authors)« less
deepTools2: a next generation web server for deep-sequencing data analysis.
Ramírez, Fidel; Ryan, Devon P; Grüning, Björn; Bhardwaj, Vivek; Kilpert, Fabian; Richter, Andreas S; Heyne, Steffen; Dündar, Friederike; Manke, Thomas
2016-07-08
We present an update to our Galaxy-based web server for processing and visualizing deeply sequenced data. Its core tool set, deepTools, allows users to perform complete bioinformatic workflows ranging from quality controls and normalizations of aligned reads to integrative analyses, including clustering and visualization approaches. Since we first described our deepTools Galaxy server in 2014, we have implemented new solutions for many requests from the community and our users. Here, we introduce significant enhancements and new tools to further improve data visualization and interpretation. deepTools continue to be open to all users and freely available as a web service at deeptools.ie-freiburg.mpg.de The new deepTools2 suite can be easily deployed within any Galaxy framework via the toolshed repository, and we also provide source code for command line usage under Linux and Mac OS X. A public and documented API for access to deepTools functionality is also available. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
mORCA: sailing bioinformatics world with mobile devices.
Díaz-Del-Pino, Sergio; Falgueras, Juan; Perez-Wohlfeil, Esteban; Trelles, Oswaldo
2018-03-01
Nearly 10 years have passed since the first mobile apps appeared. Given the fact that bioinformatics is a web-based world and that mobile devices are endowed with web-browsers, it seemed natural that bioinformatics would transit from personal computers to mobile devices but nothing could be further from the truth. The transition demands new paradigms, designs and novel implementations. Throughout an in-depth analysis of requirements of existing bioinformatics applications we designed and deployed an easy-to-use web-based lightweight mobile client. Such client is able to browse, select, compose automatically interface parameters, invoke services and monitor the execution of Web Services using the service's metadata stored in catalogs or repositories. mORCA is available at http://bitlab-es.com/morca/app as a web-app. It is also available in the App store by Apple and Play Store by Google. The software will be available for at least 2 years. ortrelles@uma.es. Source code, final web-app, training material and documentation is available at http://bitlab-es.com/morca. © The Author(s) 2017. Published by Oxford University Press.
Cloud4Psi: cloud computing for 3D protein structure similarity searching.
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-10-01
Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. © The Author 2014. Published by Oxford University Press.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel
The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less
Cloud4Psi: cloud computing for 3D protein structure similarity searching
Mrozek, Dariusz; Małysiak-Mrozek, Bożena; Kłapciński, Artur
2014-01-01
Summary: Popular methods for 3D protein structure similarity searching, especially those that generate high-quality alignments such as Combinatorial Extension (CE) and Flexible structure Alignment by Chaining Aligned fragment pairs allowing Twists (FATCAT) are still time consuming. As a consequence, performing similarity searching against large repositories of structural data requires increased computational resources that are not always available. Cloud computing provides huge amounts of computational power that can be provisioned on a pay-as-you-go basis. We have developed the cloud-based system that allows scaling of the similarity searching process vertically and horizontally. Cloud4Psi (Cloud for Protein Similarity) was tested in the Microsoft Azure cloud environment and provided good, almost linearly proportional acceleration when scaled out onto many computational units. Availability and implementation: Cloud4Psi is available as Software as a Service for testing purposes at: http://cloud4psi.cloudapp.net/. For source code and software availability, please visit the Cloud4Psi project home page at http://zti.polsl.pl/dmrozek/science/cloud4psi.htm. Contact: dariusz.mrozek@polsl.pl PMID:24930141
Doucet, Mathieu; Hobson, Tanner C.; Ferraz Leal, Ricardo Miguel
2017-08-01
The Django Remote Submission (DRS) is a Django (Django, n.d.) application to manage long running job submission, including starting the job, saving logs, and storing results. It is an independent project available as a standalone pypi package (PyPi, n.d.). It can be easily integrated in any Django project. The source code is freely available as a GitHub repository (django-remote-submission, n.d.). To run the jobs in background, DRS takes advantage of Celery (Celery, n.d.), a powerful asynchronous job queue used for running tasks in the background, and the Redis Server (Redis, n.d.), an in-memory data structure store. Celery uses brokers tomore » pass messages between a Django Project and the Celery workers. Redis is the message broker of DRS. In addition DRS provides real time monitoring of the progress of Jobs and associated logs. Through the Django Channels project (Channels, n.d.), and the usage of Web Sockets, it is possible to asynchronously display the Job Status and the live Job output (standard output and standard error) on a web page.« less
Researcher-library collaborations: Data repositories as a service for researchers.
Gordon, Andrew S; Millman, David S; Steiger, Lisa; Adolph, Karen E; Gilmore, Rick O
New interest has arisen in organizing, preserving, and sharing the raw materials-the data and metadata-that undergird the published products of research. Library and information scientists have valuable expertise to bring to bear in the effort to create larger, more diverse, and more widely used data repositories. However, for libraries to be maximally successful in providing the research data management and preservation services required of a successful data repository, librarians must work closely with researchers and learn about their data management workflows. Databrary is a data repository that is closely linked to the needs of a specific scholarly community-researchers who use video as a main source of data to study child development and learning. The project's success to date is a result of its focus on community outreach and providing services for scholarly communication, engaging institutional partners, offering services for data curation with the guidance of closely involved information professionals, and the creation of a strong technical infrastructure. Databrary plans to improve its curation tools that allow researchers to deposit their own data, enhance the user-facing feature set, increase integration with library systems, and implement strategies for long-term sustainability.
The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation.
Kocher, Jean-Pierre A; Quest, Daniel J; Duffy, Patrick; Meiners, Michael A; Moore, Raymond M; Rider, David; Hossain, Asif; Hart, Steven N; Dinu, Valentin
2014-07-01
The Biological Reference Repository (BioR) is a toolkit for annotating variants. BioR stores public and user-specific annotation sources in indexed JSON-encoded flat files (catalogs). The BioR toolkit provides the functionality to combine and retrieve annotation from these catalogs via the command-line interface. Several catalogs from commonly used annotation sources and instructions for creating user-specific catalogs are provided. Commands from the toolkit can be combined with other UNIX commands for advanced annotation processing. We also provide instructions for the development of custom annotation pipelines. The package is implemented in Java and makes use of external tools written in Java and Perl. The toolkit can be executed on Mac OS X 10.5 and above or any Linux distribution. The BioR application, quickstart, and user guide documents and many biological examples are available at http://bioinformaticstools.mayo.edu. © The Author 2014. Published by Oxford University Press.
ERIC Educational Resources Information Center
Pardos, Zachary A.; Whyte, Anthony; Kao, Kevin
2016-01-01
In this paper, we address issues of transparency, modularity, and privacy with the introduction of an open source, web-based data repository and analysis tool tailored to the Massive Open Online Course community. The tool integrates data request/authorization and distribution workflow features as well as provides a simple analytics module upload…
Code of Federal Regulations, 2010 CFR
2010-10-01
... light sources used in motor vehicle headlighting systems. This part also serves as a repository for... standardized sealed beam units used in motor vehicle headlighting systems. § 564.2 Purposes. The purposes of... manufacturing specifications of standardized sealed beam headlamp units used on motor vehicles so that all...
NASA Astrophysics Data System (ADS)
Jarboe, N.; Minnett, R.; Koppers, A.; Constable, C.; Tauxe, L.; Jonestrask, L.
2017-12-01
The Magnetics Information Consortium (MagIC) supports an online database for the paleo, geo, and rock magnetic communities ( https://earthref.org/MagIC ). Researchers can upload data into the archive and download data as selected with a sophisticated search system. MagIC has completed the transition from an Oracle backed, Perl based, server oriented website to an ElasticSearch backed, Meteor based thick client website technology stack. Using JavaScript on both the sever and the client enables increased code reuse and allows easy offloading many computational operations to the client for faster response. On-the-fly data validation, column header suggestion, and spreadsheet online editing are some new features available with the new system. The 3.0 data model, method codes, and vocabulary lists can be browsed via the MagIC website and more easily updated. Source code for MagIC is publicly available on GitHub ( https://github.com/earthref/MagIC ). The MagIC file format is natively compatible with the PmagPy ( https://github.com/PmagPy/PmagPy) paleomagnetic analysis software. MagIC files can now be downloaded from the database and viewed and interpreted in the PmagPy GUI based tool, pmag_gui. Changes or interpretations of the data can then be saved by pmag_gui in the MagIC 3.0 data format and easily uploaded to the MagIC database. The rate of new contributions to the database has been increasing with many labs contributing measurement level data for the first time in the last year. Over a dozen file format conversion scripts are available for translating non-MagIC measurement data files into the MagIC format for easy uploading. We will continue to work with more labs until the whole community has a manageable workflow for contributing their measurement level data. MagIC will continue to provide a global repository for archiving and retrieving paleomagnetic and rock magnetic data and, with the new system in place, be able to more quickly respond to the community's requests for changes and improvements.
Git as an Encrypted Distributed Version Control System
2015-03-01
options. The algorithm uses AES- 256 counter mode with an IV derived from SHA -1-HMAC hash (this is nearly identical to the GCM mode discussed earlier...built into the internal structure of Git. Every file in a Git repository is check summed with a SHA -1 hash, a one-way function with arbitrarily long...implementation. Git-encrypt calls OpenSSL cryptography library command line functions. The default cipher used is AES- 256 - Electronic Code Book (ECB), which is
Joint source-channel coding for motion-compensated DCT-based SNR scalable video.
Kondi, Lisimachos P; Ishtiaq, Faisal; Katsaggelos, Aggelos K
2002-01-01
In this paper, we develop an approach toward joint source-channel coding for motion-compensated DCT-based scalable video coding and transmission. A framework for the optimal selection of the source and channel coding rates over all scalable layers is presented such that the overall distortion is minimized. The algorithm utilizes universal rate distortion characteristics which are obtained experimentally and show the sensitivity of the source encoder and decoder to channel errors. The proposed algorithm allocates the available bit rate between scalable layers and, within each layer, between source and channel coding. We present the results of this rate allocation algorithm for video transmission over a wireless channel using the H.263 Version 2 signal-to-noise ratio (SNR) scalable codec for source coding and rate-compatible punctured convolutional (RCPC) codes for channel coding. We discuss the performance of the algorithm with respect to the channel conditions, coding methodologies, layer rates, and number of layers.
Preliminary evaluation of solution-mining intrusion into a salt-dome repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
1981-06-01
This report is the product of the work of an ONWI task force to evaluate inadvertant human intrusion into a salt dome repository by solution mining. It summarizes the work in the following areas: a general review of the levels of defense that could reduce both the likelihood and potential consequences of human intrusion into a salt dome repository; evaluation of a hypothetical intrusion scenario and its consequences; recommendation for further studies. The conclusions of this task force report can be summarized as follows: (1) it is not possible at present to establish with certainty that solution mining is crediblemore » as a human-intrusion event. The likelihood of such an intrusion will depend on the effectiveness of the preventive measures; (2) an example analysis based on the realistic approach is presented in this report; it concluded that the radiological consequences are strongly dependent upon the mode of radionuclide release from the waste form, time after emplacement, package design, impurities in the host salt, the amount of a repository intercepted, the solution mining cavity form, the length of time over which solution mining occurs, the proportion of contaminated salt source for human consumption compared to other sources, and the method of salt purification for culinary purposes; (3) worst case scenarios done by other studies suggest considerable potential for exposures to man while preliminary evaluations of more realistic cases suggest significantly reduced potential consequences. Mathematical model applications to process systems, guided by more advanced assumptions about human intrusion into geomedia, will shed more light on the potential for concerns and the degree to which mitigative measures will be required.« less
Warehousing re-annotated cancer genes for biomarker meta-analysis.
Orsini, M; Travaglione, A; Capobianco, E
2013-07-01
Translational research in cancer genomics assigns a fundamental role to bioinformatics in support of candidate gene prioritization with regard to both biomarker discovery and target identification for drug development. Efforts in both such directions rely on the existence and constant update of large repositories of gene expression data and omics records obtained from a variety of experiments. Users who interactively interrogate such repositories may have problems in retrieving sample fields that present limited associated information, due for instance to incomplete entries or sometimes unusable files. Cancer-specific data sources present similar problems. Given that source integration usually improves data quality, one of the objectives is keeping the computational complexity sufficiently low to allow an optimal assimilation and mining of all the information. In particular, the scope of integrating intraomics data can be to improve the exploration of gene co-expression landscapes, while the scope of integrating interomics sources can be that of establishing genotype-phenotype associations. Both integrations are relevant to cancer biomarker meta-analysis, as the proposed study demonstrates. Our approach is based on re-annotating cancer-specific data available at the EBI's ArrayExpress repository and building a data warehouse aimed to biomarker discovery and validation studies. Cancer genes are organized by tissue with biomedical and clinical evidences combined to increase reproducibility and consistency of results. For better comparative evaluation, multiple queries have been designed to efficiently address all types of experiments and platforms, and allow for retrieval of sample-related information, such as cell line, disease state and clinical aspects. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
The Astrophysics Source Code Library: An Update
NASA Astrophysics Data System (ADS)
Allen, Alice; Nemiroff, R. J.; Shamir, L.; Teuben, P. J.
2012-01-01
The Astrophysics Source Code Library (ASCL), founded in 1999, takes an active approach to sharing astrophysical source code. ASCL's editor seeks out both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and adds entries for the found codes to the library. This approach ensures that source codes are added without requiring authors to actively submit them, resulting in a comprehensive listing that covers a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL moved to a new location in 2010, and has over 300 codes in it and continues to grow. In 2011, the ASCL (http://asterisk.apod.com/viewforum.php?f=35) has on average added 19 new codes per month; we encourage scientists to submit their codes for inclusion. An advisory committee has been established to provide input and guide the development and expansion of its new site, and a marketing plan has been developed and is being executed. All ASCL source codes have been used to generate results published in or submitted to a refereed journal and are freely available either via a download site or from an identified source. This presentation covers the history of the ASCL and examines the current state and benefits of the ASCL, the means of and requirements for including codes, and outlines its future plans.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Neuhauser, K.S.; Cashwell, J.W.; Reardon, P.C.
1986-12-31
This paper discusses the relative national environmental impacts of transporting nuclear wastes to each of the nine candidate repository sites in the United States. Several of the potential sites are closely clustered and, for the purpose of distance and routing calculations, are treated as a single location. These are: Cypress Creek Dome and Richton Dome in Mississippi (Gulf Interior Region), Deaf Smith County and Swisher County sites in Texas (Permian Basin), and Davis Canyon and Lavender Canyon site in Utah (Paradox Basin). The remaining sites are: Vacherie Dome, Louisiana; Yucca Mountain, Nevada; and Hanford Reservation, Washington. For compatibility with bothmore » the repository system authorized by the NWPA and with the MRS option, two separate scenarios were analyzed. In belief, they are (1) shipment of spent fuel and high-level wastes (HLW) directly from waste generators to a repository (Reference Case) and (2) shipment of spent fuel to a Monitored Retrievable Storage (MRS) facility and then to a repository. Between 17 and 38 truck accident fatalities, between 1.4 and 7.7 rail accident fatalities, and between 0.22 and 12 radiological health effects can be expected to occur as a result of radioactive material transportation during the 26-year operating period of the first repository. During the same period in the United States, about 65,000 total deaths from truck accidents and about 32,000 total deaths from rail accidents would occur; also an estimated 58,300 cancer fatalities are predicted to occur in the United States during a 26-year period from exposure to background radiation alone (not including medical and other manmade sources). The risks reported here are upper limits and are small by comparison with the "natural background" of risks of the same type. 3 refs., 6 tabs.« less
NASA Astrophysics Data System (ADS)
Downs, R. R.; Chen, R. S.; de Sherbinin, A. M.
2017-12-01
Growing recognition of the importance of sharing scientific data more widely and openly has refocused attention on the state of data repositories, including both discipline- or topic-oriented data centers and institutional repositories. Data creators often have several alternatives for depositing and disseminating their natural, social, health, or engineering science data. In selecting a repository for their data, data creators and other stakeholders such as their funding agencies may wish to consider the user community or communities served, the type and quality of data products already offered, and the degree of data stewardship and associated services provided. Some data repositories serve general communities, e.g., those in their host institution or region, whereas others tailor their services to particular scientific disciplines or topical areas. Some repositories are selective when acquiring data and conduct extensive curation and reviews to ensure that data products meet quality standards. Many repositories have secured credentials and established a track record for providing trustworthy, high quality data and services. The NASA Socioeconomic Data and Applications Center (SEDAC) serves users interested in human-environment interactions, including researchers, students, and applied users from diverse sectors. SEDAC is selective when choosing data for dissemination, conducting several reviews of data products and services prior to release. SEDAC works with data producers to continually improve the quality of its open data products and services. As a Distributed Active Archive Center (DAAC) of the NASA Earth Observing System Data and Information System, SEDAC is committed to improving the accessibility, interoperability, and usability of its data in conjunction with data available from other DAACs, as well as other relevant data sources. SEDAC is certified as a Regular Member of the International Council for Science World Data System (ICSU-WDS).
Wang, Lei; Alpert, Kathryn I; Calhoun, Vince D; Cobia, Derin J; Keator, David B; King, Margaret D; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D; Potkin, Steven G; Turner, Jessica A; Ambite, Jose Luis
2016-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation--translating across data sources--so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information is being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. Copyright © 2015 Elsevier Inc. All rights reserved.
Authorship Attribution of Source Code
ERIC Educational Resources Information Center
Tennyson, Matthew F.
2013-01-01
Authorship attribution of source code is the task of deciding who wrote a program, given its source code. Applications include software forensics, plagiarism detection, and determining software ownership. A number of methods for the authorship attribution of source code have been presented in the past. A review of those existing methods is…
Scalable Metadata Management for a Large Multi-Source Seismic Data Repository
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gaylord, J. M.; Dodge, D. A.; Magana-Zook, S. A.
In this work, we implemented the key metadata management components of a scalable seismic data ingestion framework to address limitations in our existing system, and to position it for anticipated growth in volume and complexity.
MEASURING THE ACUTE TOXICITY OF ESTUARINE SEDIMENTS
Estuarine sediments frequently are repositories and sources of anthropogenic contaminants. Toxicity is one method of assessing the environmental quality of sediments, yet because of the extreme range of salinities that characterize estuaries few infaunal organisms have both the p...
Baili, Paolo; Torresani, Michele; Agresti, Roberto; Rosito, Giuseppe; Daidone, Maria Grazia; Veneroni, Silvia; Cavallo, Ilaria; Funaro, Francesco; Giunco, Marco; Turco, Alberto; Amash, Hade; Scavo, Antonio; Minicozzi, Pamela; Bella, Francesca; Meneghini, Elisabetta; Sant, Milena
2015-01-01
In clinical research, many potentially useful variables are available via the routine activity of cancer center-based clinical registries (CCCR). We present the experience of the breast cancer clinical registry at Fondazione IRCCS "Istituto Nazionale dei Tumori" to give an example of how a CCCR can be planned, implemented, and used. Five criteria were taken into consideration while planning our CCCR: (a) available clinical and administrative databases ought to be exploited to the maximum extent; (b) open source software should be used; (c) a Web-based interface must be designed; (d) CCCR data must be compatible with population-based cancer registry data; (e) CCCR must be an open system, able to be connected with other data repositories. The amount of work needed for the implementation of a CCCR is inversely linked with the amount of available coded data: the fewer data are available in the input databases as coded variables, the more work will be necessary, for information technology staff, text mining analysis, and registrars (for collecting data from clinical records). A cancer registry in a comprehensive cancer center can be used for several research aspects, such as estimate of the number of cases needed for clinical studies, assessment of biobank specimens with specific characteristics, evaluation of clinical practice and adhesion to clinical guidelines, comparative studies between clinical and population sets of patients, studies on cancer prognosis, and studies on cancer survivorship.
Schroedinger’s code: Source code availability and transparency in astrophysics
NASA Astrophysics Data System (ADS)
Ryan, PW; Allen, Alice; Teuben, Peter
2018-01-01
Astronomers use software for their research, but how many of the codes they use are available as source code? We examined a sample of 166 papers from 2015 for clearly identified software use, then searched for source code for the software packages mentioned in these research papers. We categorized the software to indicate whether source code is available for download and whether there are restrictions to accessing it, and if source code was not available, whether some other form of the software, such as a binary, was. Over 40% of the source code for the software used in our sample was not available for download.As URLs have often been used as proxy citations for software, we also extracted URLs from one journal’s 2015 research articles, removed those from certain long-term, reliable domains, and tested the remainder to determine what percentage of these URLs were still accessible in September and October, 2017.
Bridging the Gap: Need for a Data Repository to Support Vaccine Prioritization Efforts*
Madhavan, Guruprasad; Phelps, Charles; Sangha, Kinpritma; Levin, Scott; Rappuoli, Rino
2015-01-01
As the mechanisms for discovery, development, and delivery of new vaccines become increasingly complex, strategic planning and priority setting have become ever more crucial. Traditional single value metrics such as disease burden or cost-effectiveness no longer suffice to rank vaccine candidates for development. The Institute of Medicine—in collaboration with the National Academy of Engineering—has developed a novel software system to support vaccine prioritization efforts. The Strategic Multi-Attribute Ranking Tool for Vaccines—SMART Vaccines—allows decision makers to specify their own value structure, selecting from among 28 pre-defined and up to 7 user-defined attributes relevant to the ranking of vaccine candidates. Widespread use of SMART Vaccines will require compilation of a comprehensive data repository for numerous relevant populations—including their demographics, disease burdens and associated treatment costs, as well as characterizing performance features of potential or existing vaccines that might be created, improved, or deployed. While the software contains preloaded data for a modest number of populations, a large gap exists between the existing data and a comprehensive data repository necessary to make full use of SMART Vaccines. While some of these data exist in disparate sources and forms, constructing a data repository will require much new coordination and focus. Finding strategies to bridge the gap to a comprehensive data repository remains the most important task in bringing SMART Vaccines to full fruition, and to support strategic vaccine prioritization efforts in general. PMID:26022565
A strategy to establish Food Safety Model Repositories.
Plaza-Rodríguez, C; Thoens, C; Falenski, A; Weiser, A A; Appel, B; Kaesbohrer, A; Filter, M
2015-07-02
Transferring the knowledge of predictive microbiology into real world food manufacturing applications is still a major challenge for the whole food safety modelling community. To facilitate this process, a strategy for creating open, community driven and web-based predictive microbial model repositories is proposed. These collaborative model resources could significantly improve the transfer of knowledge from research into commercial and governmental applications and also increase efficiency, transparency and usability of predictive models. To demonstrate the feasibility, predictive models of Salmonella in beef previously published in the scientific literature were re-implemented using an open source software tool called PMM-Lab. The models were made publicly available in a Food Safety Model Repository within the OpenML for Predictive Modelling in Food community project. Three different approaches were used to create new models in the model repositories: (1) all information relevant for model re-implementation is available in a scientific publication, (2) model parameters can be imported from tabular parameter collections and (3) models have to be generated from experimental data or primary model parameters. All three approaches were demonstrated in the paper. The sample Food Safety Model Repository is available via: http://sourceforge.net/projects/microbialmodelingexchange/files/models and the PMM-Lab software can be downloaded from http://sourceforge.net/projects/pmmlab/. This work also illustrates that a standardized information exchange format for predictive microbial models, as the key component of this strategy, could be established by adoption of resources from the Systems Biology domain. Copyright © 2015. Published by Elsevier B.V.
Virtual Labs (Science Gateways) as platforms for Free and Open Source Science
NASA Astrophysics Data System (ADS)
Lescinsky, David; Car, Nicholas; Fraser, Ryan; Friedrich, Carsten; Kemp, Carina; Squire, Geoffrey
2016-04-01
The Free and Open Source Software (FOSS) movement promotes community engagement in software development, as well as provides access to a range of sophisticated technologies that would be prohibitively expensive if obtained commercially. However, as geoinformatics and eResearch tools and services become more dispersed, it becomes more complicated to identify and interface between the many required components. Virtual Laboratories (VLs, also known as Science Gateways) simplify the management and coordination of these components by providing a platform linking many, if not all, of the steps in particular scientific processes. These enable scientists to focus on their science, rather than the underlying supporting technologies. We describe a modular, open source, VL infrastructure that can be reconfigured to create VLs for a wide range of disciplines. Development of this infrastructure has been led by CSIRO in collaboration with Geoscience Australia and the National Computational Infrastructure (NCI) with support from the National eResearch Collaboration Tools and Resources (NeCTAR) and the Australian National Data Service (ANDS). Initially, the infrastructure was developed to support the Virtual Geophysical Laboratory (VGL), and has subsequently been repurposed to create the Virtual Hazards Impact and Risk Laboratory (VHIRL) and the reconfigured Australian National Virtual Geophysics Laboratory (ANVGL). During each step of development, new capabilities and services have been added and/or enhanced. We plan on continuing to follow this model using a shared, community code base. The VL platform facilitates transparent and reproducible science by providing access to both the data and methodologies used during scientific investigations. This is further enhanced by the ability to set up and run investigations using computational resources accessed through the VL. Data is accessed using registries pointing to catalogues within public data repositories (notably including the NCI National Environmental Research Data Interoperability Platform), or by uploading data directly from user supplied addresses or files. Similarly, scientific software is accessed through registries pointing to software repositories (e.g., GitHub). Runs are configured by using or modifying default templates designed by subject matter experts. After the appropriate computational resources are identified by the user, Virtual Machines (VMs) are spun up and jobs are submitted to service providers (currently the NeCTAR public cloud or Amazon Web Services). Following completion of the jobs the results can be reviewed and downloaded if desired. By providing a unified platform for science, the VL infrastructure enables sophisticated provenance capture and management. The source of input data (including both collection and queries), user information, software information (version and configuration details) and output information are all captured and managed as a VL resource which can be linked to output data sets. This provenance resource provides a mechanism for publication and citation for Free and Open Source Science.
Automated Report Generation for Research Data Repositories: From i2b2 to PDF.
Thiemann, Volker S; Xu, Tingyan; Röhrig, Rainer; Majeed, Raphael W
2017-01-01
We developed an automated toolchain to generate reports of i2b2 data. It is based on free open source software and runs on a Java Application Server. It is sucessfully used in an ED registry project. The solution is highly configurable and portable to other projects based on i2b2 or compatible factual data sources.
A Semantically Enabled Metadata Repository for Solar Irradiance Data Products
NASA Astrophysics Data System (ADS)
Wilson, A.; Cox, M.; Lindholm, D. M.; Nadiadi, I.; Traver, T.
2014-12-01
The Laboratory for Atmospheric and Space Physics, LASP, has been conducting research in Atmospheric and Space science for over 60 years, and providing the associated data products to the public. LASP has a long history, in particular, of making space-based measurements of the solar irradiance, which serves as crucial input to several areas of scientific research, including solar-terrestrial interactions, atmospheric, and climate. LISIRD, the LASP Interactive Solar Irradiance Data Center, serves these datasets to the public, including solar spectral irradiance (SSI) and total solar irradiance (TSI) data. The LASP extended metadata repository, LEMR, is a database of information about the datasets served by LASP, such as parameters, uncertainties, temporal and spectral ranges, current version, alerts, etc. It serves as the definitive, single source of truth for that information. The database is populated with information garnered via web forms and automated processes. Dataset owners keep the information current and verified for datasets under their purview. This information can be pulled dynamically for many purposes. Web sites such as LISIRD can include this information in web page content as it is rendered, ensuring users get current, accurate information. It can also be pulled to create metadata records in various metadata formats, such as SPASE (for heliophysics) and ISO 19115. Once these records are be made available to the appropriate registries, our data will be discoverable by users coming in via those organizations. The database is implemented as a RDF triplestore, a collection of instances of subject-object-predicate data entities identifiable with a URI. This capability coupled with SPARQL over HTTP read access enables semantic queries over the repository contents. To create the repository we leveraged VIVO, an open source semantic web application, to manage and create new ontologies and populate repository content. A variety of ontologies were used in creating the triplestore, including ontologies that came with VIVO such as FOAF. Also, the W3C DCAT ontology was integrated and extended to describe properties of our data products that we needed to capture, such as spectral range. The presentation will describe the architecture, ontology issues, and tools used to create LEMR and plans for its evolution.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferreira, Eduardo G.A.; Marumo, Julio T.; Vicente, Roberto
2012-07-01
Portland cement materials are widely used as engineered barriers in repositories for radioactive waste. The capacity of such barriers to avoid the disposed of radionuclides to entering the biosphere in the long-term depends on the service life of those materials. Thus, the performance assessment of structural materials under a series of environmental conditions prevailing at the environs of repositories is a matter of interest. The durability of cement paste foreseen as backfill in a deep borehole for disposal of disused sealed radioactive sources is investigated in the development of the repository concept. Results are intended to be part of themore » body of evidence in the safety case of the proposed disposal technology. This paper presents the results of X-Ray Diffraction (XRD) Analysis of cement paste exposed to varying temperatures and simulated groundwater after samples received the radiation dose that the cement paste will accumulate until complete decay of the radioactive sources. The XRD analysis of cement paste samples realized in this work allowed observing some differences in the results of cement paste specimens that were submitted to different treatments. The cluster analysis of results was able to group tested samples according to the applied treatments. Mineralogical differences, however, are tenuous and, apart from ettringite, are hardly observed. The absence of ettringite in all the seven specimens that were kept in dry storage at high temperature had hardly occurred by natural variations in the composition of hydrated cement paste because ettringite is observed in all tested except the seven specimens. Therefore this absence is certainly the result of the treatments and could be explained by the decomposition of ettringite. Although the temperature of decomposition is about 110-120 deg. C, it may be initially decomposed to meta-ettringite, an amorphous compound, above 50 deg. C in the absence of water. Influence of irradiation on the mineralogical composition was not observed when the treatment was analyzed individually or when analyzed under the possible synergic effect with other treatments. However, the radiation dose to which specimens were exposed is only a fraction of the accumulated dose in cement paste until complete decay of some sources. Therefore, in the short term, the conditions deemed to prevail in the repository environment may not influence the properties of cement paste at detectable levels. Under the conditions presented in this work, it is not possible to predict the long term evolution of these properties. (authors)« less
The BioGRID Interaction Database: 2011 update
Stark, Chris; Breitkreutz, Bobby-Joe; Chatr-aryamontri, Andrew; Boucher, Lorrie; Oughtred, Rose; Livstone, Michael S.; Nixon, Julie; Van Auken, Kimberly; Wang, Xiaodong; Shi, Xiaoqi; Reguly, Teresa; Rust, Jennifer M.; Winter, Andrew; Dolinski, Kara; Tyers, Mike
2011-01-01
The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions. PMID:21071413
Share Repository Framework: Component Specification and Otology
2008-04-23
Palantir Technologies has created one such software application to support the DoD intelligence community by providing robust capabilities for...managing data from various sources. The Palantir tool is based on user-defined ontologies and supports multiple representation and analysis tools
EPAs SPECIATE 4.4 Database: Development and Uses
SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of source category-specific particulate matter (PM), volatile organic gas, and other gas speciation profiles of air pollutant emissions. Abt Associates, Inc. developed SPECIATE 4.4 through a collaborat...
SPECIATE - EPA'S DATABASE OF SPECIATED EMISSION PROFILES
SPECIATE is the U.S. Environmental Protection Agency's (EPA) repository of total organic compound (TOC) and particulate matter (PM) speciation profiles for emissions from air pollution sources. The data base has recently been updated and an associated report has recently been re...
Caetano-Anollés, Gustavo; Wang, Minglei; Caetano-Anollés, Derek
2013-01-01
The genetic code shapes the genetic repository. Its origin has puzzled molecular scientists for over half a century and remains a long-standing mystery. Here we show that the origin of the genetic code is tightly coupled to the history of aminoacyl-tRNA synthetase enzymes and their interactions with tRNA. A timeline of evolutionary appearance of protein domain families derived from a structural census in hundreds of genomes reveals the early emergence of the ‘operational’ RNA code and the late implementation of the standard genetic code. The emergence of codon specificities and amino acid charging involved tight coevolution of aminoacyl-tRNA synthetases and tRNA structures as well as episodes of structural recruitment. Remarkably, amino acid and dipeptide compositions of single-domain proteins appearing before the standard code suggest archaic synthetases with structures homologous to catalytic domains of tyrosyl-tRNA and seryl-tRNA synthetases were capable of peptide bond formation and aminoacylation. Results reveal that genetics arose through coevolutionary interactions between polypeptides and nucleic acid cofactors as an exacting mechanism that favored flexibility and folding of the emergent proteins. These enhancements of phenotypic robustness were likely internalized into the emerging genetic system with the early rise of modern protein structure. PMID:23991065
Teresa E. Jordan
2015-11-15
This collection of files are part of a larger dataset uploaded in support of Low Temperature Geothermal Play Fairway Analysis for the Appalachian Basin (GPFA-AB, DOE Project DE-EE0006726). Phase 1 of the GPFA-AB project identified potential Geothermal Play Fairways within the Appalachian basin of Pennsylvania, West Virginia and New York. This was accomplished through analysis of 4 key criteria or ‘risks’: thermal quality, natural reservoir productivity, risk of seismicity, and heat utilization. Each of these analyses represent a distinct project task, with the fifth task encompassing combination of the 4 risks factors. Supporting data for all five tasks has been uploaded into the Geothermal Data Repository node of the National Geothermal Data System (NGDS). This submission comprises the data for Thermal Quality Analysis (project task 1) and includes all of the necessary shapefiles, rasters, datasets, code, and references to code repositories that were used to create the thermal resource and risk factor maps as part of the GPFA-AB project. The identified Geothermal Play Fairways are also provided with the larger dataset. Figures (.png) are provided as examples of the shapefiles and rasters. The regional standardized 1 square km grid used in the project is also provided as points (cell centers), polygons, and as a raster. Two ArcGIS toolboxes are available: 1) RegionalGridModels.tbx for creating resource and risk factor maps on the standardized grid, and 2) ThermalRiskFactorModels.tbx for use in making the thermal resource maps and cross sections. These toolboxes contain “item description” documentation for each model within the toolbox, and for the toolbox itself. This submission also contains three R scripts: 1) AddNewSeisFields.R to add seismic risk data to attribute tables of seismic risk, 2) StratifiedKrigingInterpolation.R for the interpolations used in the thermal resource analysis, and 3) LeaveOneOutCrossValidation.R for the cross validations used in the thermal interpolations. Some file descriptions make reference to various 'memos'. These are contained within the final report submitted October 16, 2015. Each zipped file in the submission contains an 'about' document describing the full Thermal Quality Analysis content available, along with key sources, authors, citation, use guidelines, and assumptions, with the specific file(s) contained within the .zip file highlighted.
Two-step web-mining approach to study geology/geophysics-related open-source software projects
NASA Astrophysics Data System (ADS)
Behrends, Knut; Conze, Ronald
2013-04-01
Geology/geophysics is a highly interdisciplinary science, overlapping with, for instance, physics, biology and chemistry. In today's software-intensive work environments, geoscientists often encounter new open-source software from scientific fields that are only remotely related to the own field of expertise. We show how web-mining techniques can help to carry out systematic discovery and evaluation of such software. In a first step, we downloaded ~500 abstracts (each consisting of ~1 kb UTF-8 text) from agu-fm12.abstractcentral.com. This web site hosts the abstracts of all publications presented at AGU Fall Meeting 2012, the world's largest annual geology/geophysics conference. All abstracts belonged to the category "Earth and Space Science Informatics", an interdisciplinary label cross-cutting many disciplines such as "deep biosphere", "atmospheric research", and "mineral physics". Each publication was represented by a highly structured record with ~20 short data attributes, the largest authorship-record being the unstructured "abstract" field. We processed texts of the abstracts with the statistics software "R" to calculate a corpus and a term-document matrix. Using R package "tm", we applied text-mining techniques to filter data and develop hypotheses about software-development activities happening in various geology/geophysics fields. Analyzing the term-document matrix with basic techniques (e.g., word frequencies, co-occurences, weighting) as well as more complex methods (clustering, classification) several key pieces of information were extracted. For example, text-mining can be used to identify scientists who are also developers of open-source scientific software, and the names of their programming projects and codes can also be identified. In a second step, based on the intermediate results found by processing the conference-abstracts, any new hypotheses can be tested in another webmining subproject: by merging the dataset with open data from github.com and stackoverflow.com. These popular, developer-centric websites have powerful application-programmer interfaces, and follow an open-data policy. In this regard, these sites offer a web-accessible reservoir of information that can be tapped to study questions such as: which open source software projects are eminent in the various geoscience fields? What are the most popular programming languages? How are they trending? Are there any interesting temporal patterns in committer activities? How large are programming teams and how do they change over time? What free software packages exist in the vast realms of related fields? Does the software from these fields have capabilities that might still be useful to me as a researcher, or can help me perform my work better? Are there any open-source projects that might be commercially interesting? This evaluation strategy reveals programming projects that tend to be new. As many important legacy codes are not hosted on open-source code-repositories, the presented search method might overlook some older projects.
Measuring Diagnoses: ICD Code Accuracy
O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M
2005-01-01
Objective To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. Data Sources/Study Setting The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. Study Design/Methods We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Principle Findings Main error sources along the “patient trajectory” include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the “paper trail” include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. Conclusions By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways. PMID:16178999
NASA Technical Reports Server (NTRS)
Penn, John M.
2013-01-01
This paper describes the adoption of a Test Driven Development approach and a Continuous Integration System in the development of the Trick Simulation Toolkit, a generic simulation development environment for creating high fidelity training and engineering simulations at the NASA/Johnson Space Center and many other NASA facilities. It describes what was learned and the significant benefits seen, such as fast, thorough, and clear test feedback every time code is checked-in to the code repository. It also describes a system that encourages development of code that is much more flexible, maintainable, and reliable. The Trick Simulation Toolkit development environment provides a common architecture for user-defined simulations. Trick builds executable simulations using user-supplied simulation-definition files (S_define) and user supplied "model code". For each Trick-based simulation, Trick automatically provides job scheduling, checkpoint / restore, data-recording, interactive variable manipulation (variable server), and an input-processor. Also included are tools for plotting recorded data and various other supporting tools and libraries. Trick is written in C/C++ and Java and supports both Linux and MacOSX. Prior to adopting this new development approach, Trick testing consisted primarily of running a few large simulations, with the hope that their complexity and scale would exercise most of Trick's code and expose any recently introduced bugs. Unsurprising, this approach yielded inconsistent results. It was obvious that a more systematic, thorough approach was required. After seeing examples of some Java-based projects that used the JUnit test framework, similar test frameworks for C and C++ were sought. Several were found, all clearly inspired by JUnit. Googletest, a freely available Open source testing framework, was selected as the most appropriate and capable. The new approach was implemented while rewriting the Trick memory management component, to eliminate a fundamental design flaw. The benefits became obvious almost immediately, not just in the correctness of the individual functions and classes but also in the correctness and flexibility being added to the overall design. Creating code to be testable, and testing as it was created resulted not only in better working code, but also in better-organized, flexible, and readable (i.e., articulate) code. This was, in essence the Test-driven development (TDD) methodology created by Kent Beck. Seeing the benefits of Test Driven Development, other Trick components were refactored to make them more testable and tests were designed and implemented for them.
UNIX programmer`s environment and configuration control
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arnold, T.R.; Wyatt, P.W.
1993-12-31
A package of UNIX utilities has been developed which unities the advantages of the public domain utility ``imake`` and a configuration control system. The ``imake`` utility is portable It allows a user to make Makefiles on a wide variety of platforms without worrying about the machine-dependent idiosyncracies of the UNIX utility ``make.`` Makefiles are a labor-saving device for compiling and linking complicated programs, and ``imake`` is a labor-saving device for making Makefiles, as well as other useful software (like a program`s internal dependencies on included files). This ``Environment,`` which has been developed around ``imake,`` allows a programmer to manage amore » complicated project consisting of multiple executables which may each link with multiple user-created libraries. The configuration control aspect consists of a directory hierarchy (a baseline) which is mirrored in a developer`s workspace. The workspace includes a minimum of files copied from the baseline; it employs soft links into the baseline wherever possible. The utilities are a multi-tiered suite of Bourne shells to copy or check out sources, check them back in, import new sources (sources which are not in the baseline) and link them appropriately, create new low-level directories and link them, compare with the baseline, update Makefiles with minimal effort, and handle dependencies. The directory hierarchy utilizes a single source repository, which is mirrored in the baseline and in a workspace for a several platform architectures. The system was originally written to support C code on Sun-4`s and RS6000`s. It has now been extended to support FORTRAN as well as C on SGI and Cray YMP platforms as well as Sun-4`s and RS6000`s.« less
MetaboLights: An Open-Access Database Repository for Metabolomics Data.
Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph
2016-03-24
MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. Copyright © 2016 John Wiley & Sons, Inc.
Long-term retrievability and safeguards for immobilized weapons plutonium in geologic storage
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peterson, P.F.
1996-05-01
If plutonium is not ultimately used as an energy source, the quantity of excess weapons plutonium (w-Pu) that would go into a US repository will be small compared to the quantity of plutonium contained in the commercial spent fuel in the repository, and the US repository(ies) will likely be only one (or two) locations out of many around the world where commercial spent fuel will be stored. Therefore excess weapons plutonium creates a small perturbation to the long-term (over 200,000 yr) global safeguard requirements for spent fuel. There are details in the differences between spent fuel and immobilized w-Pu wastemore » forms (i.e. chemical separation methods, utility for weapons, nuclear testing requirements), but these are sufficiently small to be unlikely to play a significant role in any US political decision to rebuild weapons inventories, or to change the long-term risks of theft by subnational groups.« less
A standard-enabled workflow for synthetic biology.
Myers, Chris J; Beal, Jacob; Gorochowski, Thomas E; Kuwahara, Hiroyuki; Madsen, Curtis; McLaughlin, James Alastair; Mısırlı, Göksel; Nguyen, Tramy; Oberortner, Ernst; Samineni, Meher; Wipat, Anil; Zhang, Michael; Zundel, Zach
2017-06-15
A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
An overview of platforms for cloud based development.
Fylaktopoulos, G; Goumas, G; Skolarikis, M; Sotiropoulos, A; Maglogiannis, I
2016-01-01
This paper provides an overview of the state of the art technologies for software development in cloud environments. The surveyed systems cover the whole spectrum of cloud-based development including integrated programming environments, code repositories, software modeling, composition and documentation tools, and application management and orchestration. In this work we evaluate the existing cloud development ecosystem based on a wide number of characteristics like applicability (e.g. programming and database technologies supported), productivity enhancement (e.g. editor capabilities, debugging tools), support for collaboration (e.g. repository functionality, version control) and post-development application hosting and we compare the surveyed systems. The conducted survey proves that software engineering in the cloud era has made its initial steps showing potential to provide concrete implementation and execution environments for cloud-based applications. However, a number of important challenges need to be addressed for this approach to be viable. These challenges are discussed in the article, while a conclusion is drawn that although several steps have been made, a compact and reliable solution does not yet exist.
Using RDF and Git to Realize a Collaborative Metadata Repository.
Stöhr, Mark R; Majeed, Raphael W; Günther, Andreas
2018-01-01
The German Center for Lung Research (DZL) is a research network with the aim of researching respiratory diseases. The participating study sites' register data differs in terms of software and coding system as well as data field coverage. To perform meaningful consortium-wide queries through one single interface, a uniform conceptual structure is required covering the DZL common data elements. No single existing terminology includes all our concepts. Potential candidates such as LOINC and SNOMED only cover specific subject areas or are not granular enough for our needs. To achieve a broadly accepted and complete ontology, we developed a platform for collaborative metadata management. The DZL data management group formulated detailed requirements regarding the metadata repository and the user interfaces for metadata editing. Our solution builds upon existing standard technologies allowing us to meet those requirements. Its key parts are RDF and the distributed version control system Git. We developed a software system to publish updated metadata automatically and immediately after performing validation tests for completeness and consistency.
Development of performance assessment methodology for nuclear waste isolation in geologic media
NASA Astrophysics Data System (ADS)
Bonano, E. J.; Chu, M. S. Y.; Cranwell, R. M.; Davis, P. A.
The burial of nuclear wastes in deep geologic formations as a means for their disposal is an issue of significant technical and social impact. The analysis of the processes involved can be performed only with reliable mathematical models and computer codes as opposed to conducting experiments because the time scales associated are on the order of tens of thousands of years. These analyses are concerned primarily with the migration of radioactive contaminants from the repository to the environment accessible to humans. Modeling of this phenomenon depends on a large number of other phenomena taking place in the geologic porous and/or fractured medium. These are ground-water flow, physicochemical interactions of the contaminants with the rock, heat transfer, and mass transport. Once the radionuclides have reached the accessible environment, the pathways to humans and health effects are estimated. A performance assessment methodology for a potential high-level waste repository emplaced in a basalt formation has been developed for the U.S. Nuclear Regulatory Commission.
Thayer, Erin K.; Rathkey, Daniel; Miller, Marissa Fuqua; Palmer, Ryan; Mejicano, George C.; Pusic, Martin; Kalet, Adina; Gillespie, Colleen; Carney, Patricia A.
2016-01-01
Issue Medical educators and educational researchers continue to improve their processes for managing medical student and program evaluation data using sound ethical principles. This is becoming even more important as curricular innovations are occurring across undergraduate and graduate medical education. Dissemination of findings from this work is critical, and peer-reviewed journals often require an institutional review board (IRB) determination. Approach IRB data repositories, originally designed for the longitudinal study of biological specimens, can be applied to medical education research. The benefits of such an approach include obtaining expedited review for multiple related studies within a single IRB application and allowing for more flexibility when conducting complex longitudinal studies involving large datasets from multiple data sources and/or institutions. In this paper, we inform educators and educational researchers on our analysis of the use of the IRB data repository approach to manage ethical considerations as part of best practices for amassing, pooling, and sharing data for educational research, evaluation, and improvement purposes. Implications Fostering multi-institutional studies while following sound ethical principles in the study of medical education is needed, and the IRB data repository approach has many benefits, especially for longitudinal assessment of complex multi-site data. PMID:27443407
NASA Astrophysics Data System (ADS)
Joyce, Steven; Hartley, Lee; Applegate, David; Hoek, Jaap; Jackson, Peter
2014-09-01
Forsmark in Sweden has been proposed as the site of a geological repository for spent high-level nuclear fuel, to be located at a depth of approximately 470 m in fractured crystalline rock. The safety assessment for the repository has required a multi-disciplinary approach to evaluate the impact of hydrogeological and hydrogeochemical conditions close to the repository and in a wider regional context. Assessing the consequences of potential radionuclide releases requires quantitative site-specific information concerning the details of groundwater flow on the scale of individual waste canister locations (1-10 m) as well as details of groundwater flow and composition on the scale of groundwater pathways between the facility and the surface (500 m to 5 km). The purpose of this article is to provide an illustration of multi-scale modeling techniques and the results obtained when combining aspects of local-scale flows in fractures around a potential contaminant source with regional-scale groundwater flow and transport subject to natural evolution of the system. The approach set out is novel, as it incorporates both different scales of model and different levels of detail, combining discrete fracture network and equivalent continuous porous medium representations of fractured bedrock.
Wu, Huiqun; Wei, Yufang; Shang, Yujuan; Shi, Wei; Wang, Lei; Li, Jingjing; Sang, Aimin; Shi, Lili; Jiang, Kui; Dong, Jiancheng
2018-06-06
Type 2 diabetes mellitus (T2DM) is a common chronic disease, and the fragment data collected through separated vendors makes continuous management of DM patients difficult. The lack of standard of fragment data from those diabetic patients also makes the further potential phenotyping based on the diabetic data difficult. Traditional T2DM data repository only supports data collection from T2DM patients, lack of phenotyping ability and relied on standalone database design, limiting the secondary usage of these valuable data. To solve these issues, we proposed a novel T2DM data repository framework, which was based on standards. This repository can integrate data from various sources. It would be used as a standardized record for further data transfer as well as integration. Phenotyping was conducted based on clinical guidelines with KNIME workflow. To evaluate the phenotyping performance of the proposed system, data was collected from local community by healthcare providers and was then tested using algorithms. The results indicated that the proposed system could detect DR cases with an average accuracy of about 82.8%. Furthermore, these results had the promising potential of addressing fragmented data. The proposed system has integrating and phenotyping abilities, which could be used for diabetes research in future studies.
Operational rate-distortion performance for joint source and channel coding of images.
Ruf, M J; Modestino, J W
1999-01-01
This paper describes a methodology for evaluating the operational rate-distortion behavior of combined source and channel coding schemes with particular application to images. In particular, we demonstrate use of the operational rate-distortion function to obtain the optimum tradeoff between source coding accuracy and channel error protection under the constraint of a fixed transmission bandwidth for the investigated transmission schemes. Furthermore, we develop information-theoretic bounds on performance for specific source and channel coding systems and demonstrate that our combined source-channel coding methodology applied to different schemes results in operational rate-distortion performance which closely approach these theoretical limits. We concentrate specifically on a wavelet-based subband source coding scheme and the use of binary rate-compatible punctured convolutional (RCPC) codes for transmission over the additive white Gaussian noise (AWGN) channel. Explicit results for real-world images demonstrate the efficacy of this approach.
Native American Art and Culture: Documentary Resources.
ERIC Educational Resources Information Center
Lawrence, Deirdre
1992-01-01
Presents a brief overview of the evolution of documentary material of Native American cultures and problems confronted by researchers in locating relevant information. Bibliographic sources for research are discussed and a directory of major repositories of Native American art documentation is provided. (EA)
Feasibility of Combining Common Data Elements Across Studies to Test a Hypothesis.
Corwin, Elizabeth J; Moore, Shirley M; Plotsky, Andrea; Heitkemper, Margaret M; Dorsey, Susan G; Waldrop-Valverde, Drenna; Bailey, Donald E; Docherty, Sharron L; Whitney, Joanne D; Musil, Carol M; Dougherty, Cynthia M; McCloskey, Donna J; Austin, Joan K; Grady, Patricia A
2017-05-01
The purpose of this article is to describe the outcomes of a collaborative initiative to share data across five schools of nursing in order to evaluate the feasibility of collecting common data elements (CDEs) and developing a common data repository to test hypotheses of interest to nursing scientists. This initiative extended work already completed by the National Institute of Nursing Research CDE Working Group that successfully identified CDEs related to symptoms and self-management, with the goal of supporting more complex, reproducible, and patient-focused research. Two exemplars describing the group's efforts are presented. The first highlights a pilot study wherein data sets from various studies by the represented schools were collected retrospectively, and merging of the CDEs was attempted. The second exemplar describes the methods and results of an initiative at one school that utilized a prospective design for the collection and merging of CDEs. Methods for identifying a common symptom to be studied across schools and for collecting the data dictionaries for the related data elements are presented for the first exemplar. The processes for defining and comparing the concepts and acceptable values, and for evaluating the potential to combine and compare the data elements are also described. Presented next are the steps undertaken in the second exemplar to prospectively identify CDEs and establish the data dictionaries. Methods for common measurement and analysis strategies are included. Findings from the first exemplar indicated that without plans in place a priori to ensure the ability to combine and compare data from disparate sources, doing so retrospectively may not be possible, and as a result hypothesis testing across studies may be prohibited. Findings from the second exemplar, however, indicated that a plan developed prospectively to combine and compare data sets is feasible and conducive to merged hypothesis testing. Although challenges exist in combining CDEs across studies into a common data repository, a prospective, well-designed protocol for identifying, coding, and comparing CDEs is feasible and supports the development of a common data repository and the testing of important hypotheses to advance nursing science. Incorporating CDEs across studies will increase sample size and improve data validity, reliability, transparency, and reproducibility, all of which will increase the scientific rigor of the study and the likelihood of impacting clinical practice and patient care. © 2017 Sigma Theta Tau International.
An open real-time tele-stethoscopy system.
Foche-Perez, Ignacio; Ramirez-Payba, Rodolfo; Hirigoyen-Emparanza, German; Balducci-Gonzalez, Fernando; Simo-Reigadas, Francisco-Javier; Seoane-Pascual, Joaquin; Corral-Peñafiel, Jaime; Martinez-Fernandez, Andres
2012-08-23
Acute respiratory infections are the leading cause of childhood mortality. The lack of physicians in rural areas of developing countries makes difficult their correct diagnosis and treatment. The staff of rural health facilities (health-care technicians) may not be qualified to distinguish respiratory diseases by auscultation. For this reason, the goal of this project is the development of a tele-stethoscopy system that allows a physician to receive real-time cardio-respiratory sounds from a remote auscultation, as well as video images showing where the technician is placing the stethoscope on the patient's body. A real-time wireless stethoscopy system was designed. The initial requirements were: 1) The system must send audio and video synchronously over IP networks, not requiring an Internet connection; 2) It must preserve the quality of cardiorespiratory sounds, allowing to adapt the binaural pieces and the chestpiece of standard stethoscopes, and; 3) Cardiorespiratory sounds should be recordable at both sides of the communication. In order to verify the diagnostic capacity of the system, a clinical validation with eight specialists has been designed. In a preliminary test, twelve patients have been auscultated by all the physicians using the tele-stethoscopy system, versus a local auscultation using traditional stethoscope. The system must allow listen the cardiac (systolic and diastolic murmurs, gallop sound, arrhythmias) and respiratory (rhonchi, rales and crepitations, wheeze, diminished and bronchial breath sounds, pleural friction rub) sounds. The design, development and initial validation of the real-time wireless tele-stethoscopy system are described in detail. The system was conceived from scratch as open-source, low-cost and designed in such a way that many universities and small local companies in developing countries may manufacture it. Only free open-source software has been used in order to minimize manufacturing costs and look for alliances to support its improvement and adaptation. The microcontroller firmware code, the computer software code and the PCB schematics are available for free download in a subversion repository hosted in SourceForge. It has been shown that real-time tele-stethoscopy, together with a videoconference system that allows a remote specialist to oversee the auscultation, may be a very helpful tool in rural areas of developing countries.
Software development infrastructure for the HYBRID modeling and simulation project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Epiney, Aaron S.; Kinoshita, Robert A.; Kim, Jong Suk
One of the goals of the HYBRID modeling and simulation project is to assess the economic viability of hybrid systems in a market that contains renewable energy sources like wind. The idea is that it is possible for the nuclear plant to sell non-electric energy cushions, which absorb (at least partially) the volatility introduced by the renewable energy sources. This system is currently modeled in the Modelica programming language. To assess the economics of the system, an optimization procedure is trying to find the minimal cost of electricity production. The RAVEN code is used as a driver for the wholemore » problem. It is assumed that at this stage, the HYBRID modeling and simulation framework can be classified as non-safety “research and development” software. The associated quality level is Quality Level 3 software. This imposes low requirements on quality control, testing and documentation. The quality level could change as the application development continues.Despite the low quality requirement level, a workflow for the HYBRID developers has been defined that include a coding standard and some documentation and testing requirements. The repository performs automated unit testing of contributed models. The automated testing is achieved via an open-source python script called BuildingsP from Lawrence Berkeley National Lab. BuildingsPy runs Modelica simulation tests using Dymola in an automated manner and generates and runs unit tests from Modelica scripts written by developers. In order to assure effective communication between the different national laboratories a biweekly videoconference has been set-up, where developers can report their progress and issues. In addition, periodic face-face meetings are organized intended to discuss high-level strategy decisions with management. A second means of communication is the developer email list. This is a list to which everybody can send emails that will be received by the collective of the developers and managers involved in the project. Thirdly, to exchange documents quickly, a SharePoint directory has been set-up. SharePoint allows teams and organizations to intelligently share, and collaborate on content from anywhere.« less
Continuation of research into language concepts for the mission support environment: Source code
NASA Technical Reports Server (NTRS)
Barton, Timothy J.; Ratner, Jeremiah M.
1991-01-01
Research into language concepts for the Mission Control Center is presented. A computer code for source codes is presented. The file contains the routines which allow source code files to be created and compiled. The build process assumes that all elements and the COMP exist in the current directory. The build process places as much code generation as possible on the preprocessor as possible. A summary is given of the source files as used and/or manipulated by the build routine.
The Experiment Factory: Standardizing Behavioral Experiments.
Sochat, Vanessa V; Eisenberg, Ian W; Enkavi, A Zeynep; Li, Jamie; Bissett, Patrick G; Poldrack, Russell A
2016-01-01
The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms.
The Experiment Factory: Standardizing Behavioral Experiments
Sochat, Vanessa V.; Eisenberg, Ian W.; Enkavi, A. Zeynep; Li, Jamie; Bissett, Patrick G.; Poldrack, Russell A.
2016-01-01
The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms. PMID:27199843
DataMed - an open source discovery index for finding biomedical datasets.
Chen, Xiaoling; Gururaj, Anupama E; Ozyurt, Burak; Liu, Ruiling; Soysal, Ergin; Cohen, Trevor; Tiryaki, Firat; Li, Yueling; Zong, Nansu; Jiang, Min; Rogith, Deevakar; Salimi, Mandana; Kim, Hyeon-Eui; Rocca-Serra, Philippe; Gonzalez-Beltran, Alejandra; Farcas, Claudiu; Johnson, Todd; Margolis, Ron; Alter, George; Sansone, Susanna-Assunta; Fore, Ian M; Ohno-Machado, Lucila; Grethe, Jeffrey S; Xu, Hua
2018-01-13
Finding relevant datasets is important for promoting data reuse in the biomedical domain, but it is challenging given the volume and complexity of biomedical data. Here we describe the development of an open source biomedical data discovery system called DataMed, with the goal of promoting the building of additional data indexes in the biomedical domain. DataMed, which can efficiently index and search diverse types of biomedical datasets across repositories, is developed through the National Institutes of Health-funded biomedical and healthCAre Data Discovery Index Ecosystem (bioCADDIE) consortium. It consists of 2 main components: (1) a data ingestion pipeline that collects and transforms original metadata information to a unified metadata model, called DatA Tag Suite (DATS), and (2) a search engine that finds relevant datasets based on user-entered queries. In addition to describing its architecture and techniques, we evaluated individual components within DataMed, including the accuracy of the ingestion pipeline, the prevalence of the DATS model across repositories, and the overall performance of the dataset retrieval engine. Our manual review shows that the ingestion pipeline could achieve an accuracy of 90% and core elements of DATS had varied frequency across repositories. On a manually curated benchmark dataset, the DataMed search engine achieved an inferred average precision of 0.2033 and a precision at 10 (P@10, the number of relevant results in the top 10 search results) of 0.6022, by implementing advanced natural language processing and terminology services. Currently, we have made the DataMed system publically available as an open source package for the biomedical community. © The Author 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
Stokes, Todd H; Torrance, JT; Li, Henry; Wang, May D
2008-01-01
Background A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. Results To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery. Conclusions Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at . PMID:18541053
Cove benchmark calculations using SAGUARO and FEMTRAN
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eaton, R.R.; Martinez, M.J.
1986-10-01
Three small-scale, time-dependent, benchmarking calculations have been made using the finite element codes SAGUARO, to determine hydraulic head and water velocity profiles, and FEMTRAN, to predict the solute transport. Sand and hard rock porous materials were used. Time scales for the problems, which ranged from tens of hours to thousands of years, have posed no particular diffculty for the two codes. Studies have been performed to determine the effects of computational mesh, boundary conditions, velocity formulation and SAGUARO/FEMTRAN code-coupling on water and solute transport. Results showed that mesh refinement improved mass conservation. Varying the drain-tile size in COVE 1N hadmore » a weak effect on the rate at which the tile field drained. Excellent agreement with published COVE 1N data was obtained for the hydrological field and reasonable agreement for the solute-concentration predictions. The question remains whether these types of calculations can be carried out on repository-scale problems using material characteristic curves representing tuff with fractures.« less
Measuring diagnoses: ICD code accuracy.
O'Malley, Kimberly J; Cook, Karon F; Price, Matt D; Wildes, Kimberly Raiford; Hurdle, John F; Ashton, Carol M
2005-10-01
To examine potential sources of errors at each step of the described inpatient International Classification of Diseases (ICD) coding process. The use of disease codes from the ICD has expanded from classifying morbidity and mortality information for statistical purposes to diverse sets of applications in research, health care policy, and health care finance. By describing a brief history of ICD coding, detailing the process for assigning codes, identifying where errors can be introduced into the process, and reviewing methods for examining code accuracy, we help code users more systematically evaluate code accuracy for their particular applications. We summarize the inpatient ICD diagnostic coding process from patient admission to diagnostic code assignment. We examine potential sources of errors at each step and offer code users a tool for systematically evaluating code accuracy. Main error sources along the "patient trajectory" include amount and quality of information at admission, communication among patients and providers, the clinician's knowledge and experience with the illness, and the clinician's attention to detail. Main error sources along the "paper trail" include variance in the electronic and written records, coder training and experience, facility quality-control efforts, and unintentional and intentional coder errors, such as misspecification, unbundling, and upcoding. By clearly specifying the code assignment process and heightening their awareness of potential error sources, code users can better evaluate the applicability and limitations of codes for their particular situations. ICD codes can then be used in the most appropriate ways.
Li, Yanjie; Polak, Urszula; Clark, Amanda D; Bhalla, Angela D; Chen, Yu-Yun; Li, Jixue; Farmer, Jennifer; Seyer, Lauren; Lynch, David; Butler, Jill S; Napierala, Marek
2016-08-01
Friedreich's ataxia (FRDA) represents a rare neurodegenerative disease caused by expansion of GAA trinucleotide repeats in the first intron of the FXN gene. The number of GAA repeats in FRDA patients varies from approximately 60 to <1000 and is tightly correlated with age of onset and severity of the disease symptoms. The heterogeneity of Friedreich's ataxia stresses the need for a large cohort of patient samples to conduct studies addressing the mechanism of disease pathogenesis or evaluate novel therapeutic candidates. Herein, we report the establishment and characterization of an FRDA fibroblast repository, which currently includes 50 primary cell lines derived from FRDA patients and seven lines from mutation carriers. These cells are also a source for generating induced pluripotent stem cell (iPSC) lines by reprogramming, as well as disease-relevant neuronal, cardiac, and pancreatic cells that can then be differentiated from the iPSCs. All FRDA and carrier lines are derived using a standard operating procedure and characterized to confirm mutation status, as well as expression of FXN mRNA and protein. Consideration and significance of creating disease-focused cell line and tissue repositories, especially in the context of rare and heterogeneous disorders, are presented. Although the economic aspect of creating and maintaining such repositories is important, the benefits of easy access to a collection of well-characterized cell lines for the purpose of drug discovery or disease mechanism studies overshadow the associated costs. Importantly, all FRDA fibroblast cell lines collected in our repository are available to the scientific community.
Li, Yanjie; Polak, Urszula; Clark, Amanda D.; Bhalla, Angela D.; Chen, Yu-Yun; Li, Jixue; Farmer, Jennifer; Seyer, Lauren; Lynch, David
2016-01-01
Friedreich's ataxia (FRDA) represents a rare neurodegenerative disease caused by expansion of GAA trinucleotide repeats in the first intron of the FXN gene. The number of GAA repeats in FRDA patients varies from approximately 60 to <1000 and is tightly correlated with age of onset and severity of the disease symptoms. The heterogeneity of Friedreich's ataxia stresses the need for a large cohort of patient samples to conduct studies addressing the mechanism of disease pathogenesis or evaluate novel therapeutic candidates. Herein, we report the establishment and characterization of an FRDA fibroblast repository, which currently includes 50 primary cell lines derived from FRDA patients and seven lines from mutation carriers. These cells are also a source for generating induced pluripotent stem cell (iPSC) lines by reprogramming, as well as disease-relevant neuronal, cardiac, and pancreatic cells that can then be differentiated from the iPSCs. All FRDA and carrier lines are derived using a standard operating procedure and characterized to confirm mutation status, as well as expression of FXN mRNA and protein. Consideration and significance of creating disease-focused cell line and tissue repositories, especially in the context of rare and heterogeneous disorders, are presented. Although the economic aspect of creating and maintaining such repositories is important, the benefits of easy access to a collection of well-characterized cell lines for the purpose of drug discovery or disease mechanism studies overshadow the associated costs. Importantly, all FRDA fibroblast cell lines collected in our repository are available to the scientific community. PMID:27002638
EPA’s SPECIATE 4.4 Database - Development and Uses
SPECIATE is the EPA's repository of TOG, PM, and Other Gases speciation profiles of air pollution sources. It includes weight fractions of both organic species and PM and provides data in consistent units. Species include metals, ions, elements, and organic and inorganic compound...
Scrubchem: Building Bioactivity Datasets from Pubchem Bioassay Data (SOT)
The PubChem Bioassay database is a non-curated public repository with data from 64 sources, including: ChEMBL, BindingDb, DrugBank, EPA Tox21, NIH Molecular Libraries Screening Program, and various other academic, government, and industrial contributors. Methods for extracting th...
Automated Student Model Improvement
ERIC Educational Resources Information Center
Koedinger, Kenneth R.; McLaughlin, Elizabeth A.; Stamper, John C.
2012-01-01
Student modeling plays a critical role in developing and improving instruction and instructional technologies. We present a technique for automated improvement of student models that leverages the DataShop repository, crowd sourcing, and a version of the Learning Factors Analysis algorithm. We demonstrate this method on eleven educational…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hofmann, R.B.
1995-09-01
Analogs are used to understand complex or poorly understood phenomena for which little data may be available at the actual repository site. Earthquakes are complex phenomena, and they can have a large number of effects on the natural system, as well as on engineered structures. Instrumental data close to the source of large earthquakes are rarely obtained. The rare events for which measurements are available may be used, with modfications, as analogs for potential large earthquakes at sites where no earthquake data are available. In the following, several examples of nuclear reactor and liquified natural gas facility siting are discussed.more » A potential use of analog earthquakes is proposed for a high-level nuclear waste (HLW) repository.« less
HEPData: a repository for high energy physics data
NASA Astrophysics Data System (ADS)
Maguire, Eamonn; Heinrich, Lukas; Watt, Graeme
2017-10-01
The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics papers. It comprises data points underlying several thousand publications. Over the last two years, the HEPData software has been completely rewritten using modern computing technologies as an overlay on the Invenio v3 digital library framework. The software is open source with the new site available at https://hepdata.net now replacing the previous site at http://hepdata.cedar.ac.uk. In this write-up, we describe the development of the new site and explain some of the advantages it offers over the previous platform.
Wang, Lei; Alpert, Kathryn I.; Calhoun, Vince D.; Cobia, Derin J.; Keator, David B.; King, Margaret D.; Kogan, Alexandr; Landis, Drew; Tallis, Marcelo; Turner, Matthew D.; Potkin, Steven G.; Turner, Jessica A.; Ambite, Jose Luis
2015-01-01
SchizConnect (www.schizconnect.org) is built to address the issues of multiple data repositories in schizophrenia neuroimaging studies. It includes a level of mediation—translating across data sources—so that the user can place one query, e.g. for diffusion images from male individuals with schizophrenia, and find out from across participating data sources how many datasets there are, as well as downloading the imaging and related data. The current version handles the Data Usage Agreements across different studies, as well as interpreting database-specific terminologies into a common framework. New data repositories can also be mediated to bring immediate access to existing datasets. Compared with centralized, upload data sharing models, SchizConnect is a unique, virtual database with a focus on schizophrenia and related disorders that can mediate live data as information are being updated at each data source. It is our hope that SchizConnect can facilitate testing new hypotheses through aggregated datasets, promoting discovery related to the mechanisms underlying schizophrenic dysfunction. PMID:26142271
NASA Astrophysics Data System (ADS)
Niknam, Taher; Kavousifard, Abdollah; Tabatabaei, Sajad; Aghaei, Jamshid
2011-10-01
In this paper a new multiobjective modified honey bee mating optimization (MHBMO) algorithm is presented to investigate the distribution feeder reconfiguration (DFR) problem considering renewable energy sources (RESs) (photovoltaics, fuel cell and wind energy) connected to the distribution network. The objective functions of the problem to be minimized are the electrical active power losses, the voltage deviations, the total electrical energy costs and the total emissions of RESs and substations. During the optimization process, the proposed algorithm finds a set of non-dominated (Pareto) optimal solutions which are stored in an external memory called repository. Since the objective functions investigated are not the same, a fuzzy clustering algorithm is utilized to handle the size of the repository in the specified limits. Moreover, a fuzzy-based decision maker is adopted to select the 'best' compromised solution among the non-dominated optimal solutions of multiobjective optimization problem. In order to see the feasibility and effectiveness of the proposed algorithm, two standard distribution test systems are used as case studies.
NASA Astrophysics Data System (ADS)
Allen, Alice; Teuben, Peter J.; Ryan, P. Wesley
2018-05-01
We examined software usage in a sample set of astrophysics research articles published in 2015 and searched for the source codes for the software mentioned in these research papers. We categorized the software to indicate whether the source code is available for download and whether there are restrictions to accessing it, and if the source code is not available, whether some other form of the software, such as a binary, is. We also extracted hyperlinks from one journal’s 2015 research articles, as links in articles can serve as an acknowledgment of software use and lead to the data used in the research, and tested them to determine which of these URLs are still accessible. For our sample of 715 software instances in the 166 articles we examined, we were able to categorize 418 records as according to whether source code was available and found that 285 unique codes were used, 58% of which offered the source code for download. Of the 2558 hyperlinks extracted from 1669 research articles, at best, 90% of them were available over our testing period.
Master Metadata Repository and Metadata-Management System
NASA Technical Reports Server (NTRS)
Armstrong, Edward; Reed, Nate; Zhang, Wen
2007-01-01
A master metadata repository (MMR) software system manages the storage and searching of metadata pertaining to data from national and international satellite sources of the Global Ocean Data Assimilation Experiment (GODAE) High Resolution Sea Surface Temperature Pilot Project [GHRSSTPP]. These sources produce a total of hundreds of data files daily, each file classified as one of more than ten data products representing global sea-surface temperatures. The MMR is a relational database wherein the metadata are divided into granulelevel records [denoted file records (FRs)] for individual satellite files and collection-level records [denoted data set descriptions (DSDs)] that describe metadata common to all the files from a specific data product. FRs and DSDs adhere to the NASA Directory Interchange Format (DIF). The FRs and DSDs are contained in separate subdatabases linked by a common field. The MMR is configured in MySQL database software with custom Practical Extraction and Reporting Language (PERL) programs to validate and ingest the metadata records. The database contents are converted into the Federal Geographic Data Committee (FGDC) standard format by use of the Extensible Markup Language (XML). A Web interface enables users to search for availability of data from all sources.
The igmspec database of public spectra probing the intergalactic medium
NASA Astrophysics Data System (ADS)
Prochaska, J. X.
2017-04-01
We describe v02 of igmspec, a database of publicly available ultraviolet, optical, and near-infrared spectra that probe the intergalactic medium (IGM). This database, a child of the specdb repository in the specdb github organization, comprises 403 277 unique sources and 434 686 spectra obtained with the world's greatest observatories. All of these data are distributed in a single ≈ 25GB HDF5 file maintained at the University of California Observatories and the University of California, Santa Cruz. The specdb software package includes Python scripts and modules for searching the source catalog and spectral datasets, and software links to the linetools package for spectral analysis. The repository also includes software to generate private spectral datasets that are compliant with International Virtual Observatory Alliance (IVOA) protocols and a Python-based interface for IVOA Simple Spectral Access queries. Future versions of igmspec will ingest other sources (e.g. gamma-ray burst afterglows) and other surveys as they become publicly available. The overall goal is to include every spectrum that effectively probes the IGM. Future databases of specdb may include publicly available galaxy spectra (exgalspec) and published supernovae spectra (snspec). The community is encouraged to join the effort on github: https://github.com/specdb.
Automated Concurrent Blackboard System Generation in C++
NASA Technical Reports Server (NTRS)
Kaplan, J. A.; McManus, J. W.; Bynum, W. L.
1999-01-01
In his 1992 Ph.D. thesis, "Design and Analysis Techniques for Concurrent Blackboard Systems", John McManus defined several performance metrics for concurrent blackboard systems and developed a suite of tools for creating and analyzing such systems. These tools allow a user to analyze a concurrent blackboard system design and predict the performance of the system before any code is written. The design can be modified until simulated performance is satisfactory. Then, the code generator can be invoked to generate automatically all of the code required for the concurrent blackboard system except for the code implementing the functionality of each knowledge source. We have completed the port of the source code generator and a simulator for a concurrent blackboard system. The source code generator generates the necessary C++ source code to implement the concurrent blackboard system using Parallel Virtual Machine (PVM) running on a heterogeneous network of UNIX(trademark) workstations. The concurrent blackboard simulator uses the blackboard specification file to predict the performance of the concurrent blackboard design. The only part of the source code for the concurrent blackboard system that the user must supply is the code implementing the functionality of the knowledge sources.
Performance Assessment of a Generic Repository in Bedded Salt for DOE-Managed Nuclear Waste
NASA Astrophysics Data System (ADS)
Stein, E. R.; Sevougian, S. D.; Hammond, G. E.; Frederick, J. M.; Mariner, P. E.
2016-12-01
A mined repository in salt is one of the concepts under consideration for disposal of DOE-managed defense-related spent nuclear fuel (SNF) and high level waste (HLW). Bedded salt is a favorable medium for disposal of nuclear waste due to its low permeability, high thermal conductivity, and ability to self-heal. Sandia's Generic Disposal System Analysis framework is used to assess the ability of a generic repository in bedded salt to isolate radionuclides from the biosphere. The performance assessment considers multiple waste types of varying thermal load and radionuclide inventory, the engineered barrier system comprising the waste packages, backfill, and emplacement drifts, and the natural barrier system formed by a bedded salt deposit and the overlying sedimentary sequence (including an aquifer). The model simulates disposal of nearly the entire inventory of DOE-managed, defense-related SNF (excluding Naval SNF) and HLW in a half-symmetry domain containing approximately 6 million grid cells. Grid refinement captures the detail of 25,200 individual waste packages in 180 disposal panels, associated access halls, and 4 shafts connecting the land surface to the repository. Equations describing coupled heat and fluid flow and reactive transport are solved numerically with PFLOTRAN, a massively parallel flow and transport code. Simulated processes include heat conduction and convection, waste package failure, waste form dissolution, radioactive decay and ingrowth, sorption, solubility limits, advection, dispersion, and diffusion. Simulations are run to 1 million years, and radionuclide concentrations are observed within an aquifer at a point approximately 4 kilometers downgradient of the repository. The software package DAKOTA is used to sample likely ranges of input parameters including waste form dissolution rates and properties of engineered and natural materials in order to quantify uncertainty in predicted concentrations and sensitivity to input parameters. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
You've Written a Cool Astronomy Code! Now What Do You Do with It?
NASA Astrophysics Data System (ADS)
Allen, Alice; Accomazzi, A.; Berriman, G. B.; DuPrie, K.; Hanisch, R. J.; Mink, J. D.; Nemiroff, R. J.; Shamir, L.; Shortridge, K.; Taylor, M. B.; Teuben, P. J.; Wallin, J. F.
2014-01-01
Now that you've written a useful astronomy code for your soon-to-be-published research, you have to figure out what you want to do with it. Our suggestion? Share it! This presentation highlights the means and benefits of sharing your code. Make your code citable -- submit it to the Astrophysics Source Code Library and have it indexed by ADS! The Astrophysics Source Code Library (ASCL) is a free online registry of source codes of interest to astronomers and astrophysicists. With over 700 codes, it is continuing its rapid growth, with an average of 17 new codes a month. The editors seek out codes for inclusion; indexing by ADS improves the discoverability of codes and provides a way to cite codes as separate entries, especially codes without papers that describe them.
Greiver, Michelle; Wintemute, Kimberly; Aliarzadeh, Babak; Martin, Ken; Khan, Shahriar; Jackson, Dave; Leggett, Jannet; Lambert-Lanning, Anita; Siu, Maggie
2016-10-12
Consistent and standardized coding for chronic conditions is associated with better care; however, coding may currently be limited in electronic medical records (EMRs) used in Canadian primary care.Objectives To implement data management activities in a community-based primary care organisation and to evaluate the effects on coding for chronic conditions. Fifty-nine family physicians in Toronto, Ontario, belonging to a single primary care organisation, participated in the study. The organisation implemented a central analytical data repository containing their EMR data extracted, cleaned, standardized and returned by the Canadian Primary Care Sentinel Surveillance Network (CPCSSN), a large validated primary care EMR-based database. They used reporting software provided by CPCSSN to identify selected chronic conditions and standardized codes were then added back to the EMR. We studied four chronic conditions (diabetes, hypertension, chronic obstructive pulmonary disease and dementia). We compared changes in coding over six months for physicians in the organisation with changes for 315 primary care physicians participating in CPCSSN across Canada. Chronic disease coding within the organisation increased significantly more than in other primary care sites. The adjusted difference in the increase of coding was 7.7% (95% confidence interval 7.1%-8.2%, p < 0.01). The use of standard codes, consisting of the most common diagnostic codes for each condition in the CPCSSN database, increased by 8.9% more (95% CI 8.3%-9.5%, p < 0.01). Data management activities were associated with an increase in standardized coding for chronic conditions. Exploring requirements to scale and spread this approach in Canadian primary care organisations may be worthwhile.
EPA’s SPECIATE 4.4 Database:Development and Uses
SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...
EPA’s SPECIATE 4.4 Database: Development and Uses
SPECIATE is the U.S. Environmental Protection Agency's (EPA)repository of volatile organic gas and particulate matter (PM) speciation profiles for air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, VOC, total...
17 CFR 45.6 - Legal entity identifiers
Code of Federal Regulations, 2013 CFR
2013-04-01
... applied to swap data repositories by part 49 of this chapter. (4) Open Source. The schema for the legal... Section 45.6 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA... to the jurisdiction of the Commission shall be identified in all recordkeeping and all swap data...
17 CFR 45.6 - Legal entity identifiers
Code of Federal Regulations, 2012 CFR
2012-04-01
... applied to swap data repositories by part 49 of this chapter. (4) Open Source. The schema for the legal... Section 45.6 Commodity and Securities Exchanges COMMODITY FUTURES TRADING COMMISSION SWAP DATA... to the jurisdiction of the Commission shall be identified in all recordkeeping and all swap data...
ACToR: Aggregated Computational Toxicology Resource (T)
The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information ...
10 CFR 60.3 - License required.
Code of Federal Regulations, 2011 CFR
2011-01-01
... 10 Energy 2 2011-01-01 2011-01-01 false License required. 60.3 Section 60.3 Energy NUCLEAR REGULATORY COMMISSION (CONTINUED) DISPOSAL OF HIGH-LEVEL RADIOACTIVE WASTES IN GEOLOGIC REPOSITORIES General Provisions § 60.3 License required. (a) DOE shall not receive or possess source, special nuclear, or...
10 CFR 60.3 - License required.
Code of Federal Regulations, 2014 CFR
2014-01-01
... 10 Energy 2 2014-01-01 2014-01-01 false License required. 60.3 Section 60.3 Energy NUCLEAR REGULATORY COMMISSION (CONTINUED) DISPOSAL OF HIGH-LEVEL RADIOACTIVE WASTES IN GEOLOGIC REPOSITORIES General Provisions § 60.3 License required. (a) DOE shall not receive or possess source, special nuclear, or...
10 CFR 60.3 - License required.
Code of Federal Regulations, 2013 CFR
2013-01-01
... 10 Energy 2 2013-01-01 2013-01-01 false License required. 60.3 Section 60.3 Energy NUCLEAR REGULATORY COMMISSION (CONTINUED) DISPOSAL OF HIGH-LEVEL RADIOACTIVE WASTES IN GEOLOGIC REPOSITORIES General Provisions § 60.3 License required. (a) DOE shall not receive or possess source, special nuclear, or...
SPECIATE 4.0: SPECIATION DATABASE DEVELOPMENT DOCUMENTATION--FINAL REPORT
SPECIATE is the U.S. EPA's repository of total organic compounds (TOC) and particulate matter (PM) speciation profiles of air pollution sources. This report documents how EPA developed the SPECIATE 4.0 database that replaces the prior version, SPECIATE 3.2. SPECIATE 4.0 includes ...
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions
Karr, Jonathan R.; Phillips, Nolan C.; Covert, Markus W.
2014-01-01
Mechanistic ‘whole-cell’ models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. Database URL: http://www.wholecellsimdb.org Source code repository URL: http://github.com/CovertLab/WholeCellSimDB PMID:25231498
Meyer, Michael J; Geske, Philip; Yu, Haiyuan
2016-05-15
Biological sequence databases are integral to efforts to characterize and understand biological molecules and share biological data. However, when analyzing these data, scientists are often left holding disparate biological currency-molecular identifiers from different databases. For downstream applications that require converting the identifiers themselves, there are many resources available, but analyzing associated loci and variants can be cumbersome if data is not given in a form amenable to particular analyses. Here we present BISQUE, a web server and customizable command-line tool for converting molecular identifiers and their contained loci and variants between different database conventions. BISQUE uses a graph traversal algorithm to generalize the conversion process for residues in the human genome, genes, transcripts and proteins, allowing for conversion across classes of molecules and in all directions through an intuitive web interface and a URL-based web service. BISQUE is freely available via the web using any major web browser (http://bisque.yulab.org/). Source code is available in a public GitHub repository (https://github.com/hyulab/BISQUE). haiyuan.yu@cornell.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Theodore Larrieu, Christopher Slominski, Michele Joyce
2011-03-01
With the inauguration of the CEBAF Element Database (CED) in Fall 2010, Jefferson Lab computer scientists have taken a step toward the eventual goal of a model-driven accelerator. Once fully populated, the database will be the primary repository of information used for everything from generating lattice decks to booting control computers to building controls screens. A requirement influencing the CED design is that it provide access to not only present, but also future and past configurations of the accelerator. To accomplish this, an introspective database schema was designed that allows new elements, types, and properties to be defined on-the-fly withmore » no changes to table structure. Used in conjunction with Oracle Workspace Manager, it allows users to query data from any time in the database history with the same tools used to query the present configuration. Users can also check-out workspaces to use as staging areas for upcoming machine configurations. All Access to the CED is through a well-documented Application Programming Interface (API) that is translated automatically from original C++ source code into native libraries for scripting languages such as perl, php, and TCL making access to the CED easy and ubiquitous.« less
Duda, Stephanie; Fahim, Christine; Szatmari, Peter; Bennett, Kathryn
2017-07-01
Innovative strategies that facilitate the use of high quality practice guidelines (PG) are needed. Accordingly, repositories designed to simplify access to PGs have been proposed as a critical component of the network of linked interventions needed to drive increased PG implementation. The National Guideline Clearinghouse (NGC) is a free, international online repository. We investigated whether it is a trustworthy source of child and youth anxiety and depression PGs. English language PGs published between January 2009 and February 2016 relevant to anxiety or depression in children and adolescents (≤ 18 years of age) were eligible. Two trained raters assessed PG quality using Appraisal of Guidelines for Research and Evaluation (AGREE II). Scores on at least three AGREE II domains (stakeholder involvement, rigor of development, and editorial independence) were used to designate PGs as: i) minimum quality (≥ 50%); and ii) high quality (≥ 70%). Eight eligible PGs were identified (depression, n=6; anxiety and depression, n=1; social anxiety disorder, n=1). Four of eight PGs met minimum quality criteria; three of four met high quality criteria. At present, NGC users without the time and special skills required to evaluate PG quality may unknowingly choose flawed PGs to guide decisions about child and youth anxiety and depression. The recent NGC decision to explore the inclusion of PG quality profiles based on Institute of Medicine standards provides needed leadership that can strengthen PG repositories, prevent harm and wasted resources, and build PG developer capacity.
Development of DKB ETL module in case of data conversion
NASA Astrophysics Data System (ADS)
Kaida, A. Y.; Golosova, M. V.; Grigorieva, M. A.; Gubin, M. Y.
2018-05-01
Modern scientific experiments involve the producing of huge volumes of data that requires new approaches in data processing and storage. These data themselves, as well as their processing and storage, are accompanied by a valuable amount of additional information, called metadata, distributed over multiple informational systems and repositories, and having a complicated, heterogeneous structure. Gathering these metadata for experiments in the field of high energy nuclear physics (HENP) is a complex issue, requiring the quest for solutions outside the box. One of the tasks is to integrate metadata from different repositories into some kind of a central storage. During the integration process, metadata taken from original source repositories go through several processing steps: metadata aggregation, transformation according to the current data model and loading it to the general storage in a standardized form. The R&D project of ATLAS experiment on LHC, Data Knowledge Base, is aimed to provide fast and easy access to significant information about LHC experiments for the scientific community. The data integration subsystem, being developed for the DKB project, can be represented as a number of particular pipelines, arranging data flow from data sources to the main DKB storage. The data transformation process, represented by a single pipeline, can be considered as a number of successive data transformation steps, where each step is implemented as an individual program module. This article outlines the specifics of program modules, used in the dataflow, and describes one of the modules developed and integrated into the data integration subsystem of DKB.
The random energy model in a magnetic field and joint source channel coding
NASA Astrophysics Data System (ADS)
Merhav, Neri
2008-09-01
We demonstrate that there is an intimate relationship between the magnetic properties of Derrida’s random energy model (REM) of spin glasses and the problem of joint source-channel coding in Information Theory. In particular, typical patterns of erroneously decoded messages in the coding problem have “magnetization” properties that are analogous to those of the REM in certain phases, where the non-uniformity of the distribution of the source in the coding problem plays the role of an external magnetic field applied to the REM. We also relate the ensemble performance (random coding exponents) of joint source-channel codes to the free energy of the REM in its different phases.
2014-06-01
User Manual and Source Code for a LAMMPS Implementation of Constant Energy Dissipative Particle Dynamics (DPD-E) by James P. Larentzos...Laboratory Aberdeen Proving Ground, MD 21005-5069 ARL-SR-290 June 2014 User Manual and Source Code for a LAMMPS Implementation of Constant...3. DATES COVERED (From - To) September 2013–February 2014 4. TITLE AND SUBTITLE User Manual and Source Code for a LAMMPS Implementation of
MTpy - Python Tools for Magnetotelluric Data Processing and Analysis
NASA Astrophysics Data System (ADS)
Krieger, Lars; Peacock, Jared; Thiel, Stephan; Inverarity, Kent; Kirkby, Alison; Robertson, Kate; Soeffky, Paul; Didana, Yohannes
2014-05-01
We present the Python package MTpy, which provides functions for the processing, analysis, and handling of magnetotelluric (MT) data sets. MT is a relatively immature and not widely applied geophysical method in comparison to other geophysical techniques such as seismology. As a result, the data processing within the academic MT community is not thoroughly standardised and is often based on a loose collection of software, adapted to the respective local specifications. We have developed MTpy to overcome problems that arise from missing standards, and to provide a simplification of the general handling of MT data. MTpy is written in Python, and the open-source code is freely available from a GitHub repository. The setup follows the modular approach of successful geoscience software packages such as GMT or Obspy. It contains sub-packages and modules for the various tasks within the standard work-flow of MT data processing and interpretation. In order to allow the inclusion of already existing and well established software, MTpy does not only provide pure Python classes and functions, but also wrapping command-line scripts to run standalone tools, e.g. modelling and inversion codes. Our aim is to provide a flexible framework, which is open for future dynamic extensions. MTpy has the potential to promote the standardisation of processing procedures and at same time be a versatile supplement for existing algorithms. Here, we introduce the concept and structure of MTpy, and we illustrate the workflow of MT data processing, interpretation, and visualisation utilising MTpy on example data sets collected over different regions of Australia and the USA.
Problem Management Module: An Innovative System to Improve Problem List Workflow
Hodge, Chad M.; Kuttler, Kathryn G.; Bowes, Watson A.; Narus, Scott P.
2014-01-01
Electronic problem lists are essential to modern health record systems, with a primary goal to serve as the repository of a patient’s current health issues. Additionally, coded problems can be used to drive downstream activities such as decision support, evidence-based medicine, billing, and cohort generation for research. Meaningful Use also requires use of a coded problem list. Over the course of three years, Intermountain Healthcare developed a problem management module (PMM) that provided innovative functionality to improve clinical workflow and boost problem list adoption, e.g. smart search, user customizable views, problem evolution, and problem timelines. In 23 months of clinical use, clinicians entered over 70,000 health issues, the percentage of free-text items dropped to 1.2%, completeness of problem list items increased by 14%, and more collaborative habits were initiated. PMID:25954372
Moderate-temperature zeolitic alteration in a cooling pyroclastic deposit
Levy, S.S.; O'Neil, J.R.
1989-01-01
The locally zeolitized Topopah Spring Member of the Paintbrush Tuff (13 Myr.), Yucca Mountain, Nevada, U.S.A., is part of a thick sequence of zeolitized pyroclastic units. Most of the zeolitized units are nonwelded tuffs that were altered during low-temperature diagenesis, but the distribution and textural setting of zeolite (heulandite-clinoptilolite) and smectite in the densely welded Topopah Spring tuff suggest that these hydrous minerals formed while the tuff was still cooling after pyroclastic emplacement and welding. The hydrous minerals are concentrated within a transition zone between devitrified tuff in the central part of the unit and underlying vitrophyre. Movement of liquid and convected heat along fractures from the devitrified tuff to the ritrophyre caused local devitrification and hydrous mineral crystallization. Oxygen isotope geothermometry of cogenetic quartz confirms the nondiagenetic moderate temperature origin of the hydrous minerals at temperatures of ??? 40-100??C, assuming a meteoric water source. The Topopah Spring tuff is under consideration for emplacement of a high-level nuclear waste repository. The natural rock alteration of the cooling pyroclastic deposit may be a good natural analog for repository-induced hydrothermal alteration. As a result of repository thermal loading, temperatures in the Topopah Spring vitrophyre may rise sufficiently to duplicate the inferred temperatures of natural zeolitic alteration. Heated water moving downward from the repository into the vitrophyre may contribute to new zeolitic alteration. ?? 1989.
An Investigation of Generic Structures of Pakistani Doctoral Thesis Acknowledgements
ERIC Educational Resources Information Center
Rofess, Sakander; Mahmood, Muhammad Asim
2015-01-01
This paper investigates Pakistani doctoral thesis acknowledgements from genre analysis perspective. A corpus of 235 PhD thesis acknowledgements written in English was taken from Pakistani doctoral theses collected from eight different disciplines. HEC Research Repository of Pakistan was used as a data sources. The theses written by Pakistani…
US EPA's SPECIATE 4.4 Database: Development and Uses
SPECIATE is the U.S. Environmental Protection Agency’s (EPA) repository of volatile organic gas and particulate matter (PM) speciation profiles of air pollution sources. EPA released SPECIATE 4.4 in early 2014 and, in total, the SPECIATE 4.4 database includes 5,728 PM, volatile o...
36 CFR 1290.3 - Sources of assassination records and additional records and information.
Code of Federal Regulations, 2011 CFR
2011-07-01
... service with a government agency, office, or entity; (f) Persons, including individuals and corporations... Government; (b) Agencies, offices, and entities of the executive, legislative, and judicial branches of state and local governments; (c) Record repositories and archives of Federal, state, and local governments...
The Amistad Research Center: Documenting the African American Experience.
ERIC Educational Resources Information Center
Chepesiuk, Ron
1993-01-01
Describes the Amistad Research Center housed at Tulane University which is a repository of primary documents on African-American history. Topics addressed include the development and growth of the collection; inclusion of the American Missionary Association archives; sources of support; civil rights; and collecting for the future. (LRW)
AN OPEN-SOURCE COMMUNITY WEB SITE TO SUPPORT GROUND-WATER MODEL TESTING
A community wiki wiki web site has been created as a resource to support ground-water model development and testing. The Groundwater Gourmet wiki is a repository for user supplied analytical and numerical recipes, how-to's, and examples. Members are encouraged to submit analyti...
OER Use in Intermediate Language Instruction: A Case Study
ERIC Educational Resources Information Center
Godwin-Jones, Robert
2017-01-01
This paper reports on a case study in the experimental use of Open Educational Resources (OERs) in intermediate level language instruction. The resources come from three sources: the instructor, the students, and open content repositories. The objective of this action research project was to provide student-centered learning materials, enhance…
Intelligent resource discovery using ontology-based resource profiles
NASA Technical Reports Server (NTRS)
Hughes, J. Steven; Crichton, Dan; Kelly, Sean; Crichton, Jerry; Tran, Thuy
2004-01-01
Successful resource discovery across heterogeneous repositories is strongly dependent on the semantic and syntactic homogeneity of the associated resource descriptions. Ideally, resource descriptions are easily extracted from pre-existing standardized sources, expressed using standard syntactic and semantic structures, and managed and accessed within a distributed, flexible, and scaleable software framework.
USDA-ARS?s Scientific Manuscript database
Seedlings from seven open-pollinated selections of Chinese wingnut (Pterocarya stenoptera) (WN) representing collections of the USDA-ARS National Clonal Germplasm Repository at Davis CA and the University of California at Davis were evaluated as rootstocks for resistance to Phytophthora cinnamomi an...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Engeman, J. K.; Girardot, C. L.; Harlow, D. G.
2012-12-20
This report contains reference materials cited in RPP-ASMT -53793, Tank 241-AY-102 Leak Assessment Report, that were obtained from the National Archives Federal Records Repository in Seattle, Washington, or from other sources including the Hanford Site's Integrated Data Management System database (IDMS).
House dust is a repository for environmental pollutants that may accumulate indoors from both internal and external sources over long periods of time. Dust and tracked-in soil accumulate most efficiently in carpets, and the pollutants associated with it may present an exposure...
NASA Astrophysics Data System (ADS)
Hadgu, T.; Kalinina, E.; Klise, K. A.; Wang, Y.
2015-12-01
Numerical modeling of disposal of nuclear waste in a deep geologic repository in fractured crystalline rock requires robust characterization of fractures. Various methods for fracture representation in granitic rocks exist. In this study we used the fracture continuum model (FCM) to characterize fractured rock for use in the simulation of flow and transport in the far field of a generic nuclear waste repository located at 500 m depth. The FCM approach is a stochastic method that maps the permeability of discrete fractures onto a regular grid. The method generates permeability fields using field observations of fracture sets. The original method described in McKenna and Reeves (2005) was designed for vertical fractures. The method has since then been extended to incorporate fully three-dimensional representations of anisotropic permeability, multiple independent fracture sets, and arbitrary fracture dips and orientations, and spatial correlation (Kalinina et al. 20012, 2014). For this study the numerical code PFLOTRAN (Lichtner et al., 2015) has been used to model flow and transport. PFLOTRAN solves a system of generally nonlinear partial differential equations describing multiphase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Benchmark tests were conducted to simulate flow and transport in a specified model domain. Distributions of fracture parameters were used to generate a selected number of realizations. For each realization, the FCM method was used to generate a permeability field of the fractured rock. The PFLOTRAN code was then used to simulate flow and transport in the domain. Simulation results and analysis are presented. The results indicate that the FCM approach is a viable method to model fractured crystalline rocks. The FCM is a computationally efficient way to generate realistic representation of complex fracture systems. This approach is of interest for nuclear waste disposal models applied over large domains.
Astronomy education and the Astrophysics Source Code Library
NASA Astrophysics Data System (ADS)
Allen, Alice; Nemiroff, Robert J.
2016-01-01
The Astrophysics Source Code Library (ASCL) is an online registry of source codes used in refereed astrophysics research. It currently lists nearly 1,200 codes and covers all aspects of computational astrophysics. How can this resource be of use to educators and to the graduate students they mentor? The ASCL serves as a discovery tool for codes that can be used for one's own research. Graduate students can also investigate existing codes to see how common astronomical problems are approached numerically in practice, and use these codes as benchmarks for their own solutions to these problems. Further, they can deepen their knowledge of software practices and techniques through examination of others' codes.
Practical management of heterogeneous neuroimaging metadata by global neuroimaging data repositories
Neu, Scott C.; Crawford, Karen L.; Toga, Arthur W.
2012-01-01
Rapidly evolving neuroimaging techniques are producing unprecedented quantities of digital data at the same time that many research studies are evolving into global, multi-disciplinary collaborations between geographically distributed scientists. While networked computers have made it almost trivial to transmit data across long distances, collecting and analyzing this data requires extensive metadata if the data is to be maximally shared. Though it is typically straightforward to encode text and numerical values into files and send content between different locations, it is often difficult to attach context and implicit assumptions to the content. As the number of and geographic separation between data contributors grows to national and global scales, the heterogeneity of the collected metadata increases and conformance to a single standardization becomes implausible. Neuroimaging data repositories must then not only accumulate data but must also consolidate disparate metadata into an integrated view. In this article, using specific examples from our experiences, we demonstrate how standardization alone cannot achieve full integration of neuroimaging data from multiple heterogeneous sources and why a fundamental change in the architecture of neuroimaging data repositories is needed instead. PMID:22470336
Neu, Scott C; Crawford, Karen L; Toga, Arthur W
2012-01-01
Rapidly evolving neuroimaging techniques are producing unprecedented quantities of digital data at the same time that many research studies are evolving into global, multi-disciplinary collaborations between geographically distributed scientists. While networked computers have made it almost trivial to transmit data across long distances, collecting and analyzing this data requires extensive metadata if the data is to be maximally shared. Though it is typically straightforward to encode text and numerical values into files and send content between different locations, it is often difficult to attach context and implicit assumptions to the content. As the number of and geographic separation between data contributors grows to national and global scales, the heterogeneity of the collected metadata increases and conformance to a single standardization becomes implausible. Neuroimaging data repositories must then not only accumulate data but must also consolidate disparate metadata into an integrated view. In this article, using specific examples from our experiences, we demonstrate how standardization alone cannot achieve full integration of neuroimaging data from multiple heterogeneous sources and why a fundamental change in the architecture of neuroimaging data repositories is needed instead.
Huang, Haiyan; Liu, Chun-Chi; Zhou, Xianghong Jasmine
2010-04-13
The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.
DOE Office of Scientific and Technical Information (OSTI.GOV)
WANG,YIFENG; XU,HUIFANG
Correctly identifying the possible alteration products and accurately predicting their occurrence in a repository-relevant environment are the key for the source-term calculation in a repository performance assessment. Uraninite in uranium deposits has long been used as a natural analog to spent fuel in a repository because of their chemical and structural similarity. In this paper, a SEM/AEM investigation has been conducted on a partially alternated uraninite sample from a uranium ore deposit of Shinkolobwe of Congo. The mineral formation sequences were identified: uraninite {yields} uranyl hydrates {yields} uranyl silicates {yields} Ca-uranyl silicates or uraninite {yields} uranyl silicates {yields} Ca-uranyl silicates.more » Reaction-path calculations were conducted for the oxidative dissolution of spent fuel in a representative Yucca Mountain groundwater. The predicted sequence is in general consistent with the SEM observations. The calculations also show that uranium carbonate minerals are unlikely to become major solubility-controlling mineral phases in a Yucca Mountain environment. Some discrepancies between model predictions and field observations are observed. Those discrepancies may result from poorly constrained thermodynamic data for uranyl silicate minerals.« less
ACToR: Aggregated Computational Toxicology Resource (T) ...
The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information on chemicals of interest to the EPA, and in particular to be a central source for the testing data on all chemicals regulated by all EPA programs; (2) To be a source of in vivo training data sets for building in vitro to in vivo computational models; (3) To serve as a central source of chemical structure and identity information for the ToxCastTM and Tox21 programs. There are 4 main databases, all linked through a common set of chemical information and a common structure linking chemicals to assay data: the public ACToR system (available at http://actor.epa.gov), the ToxMiner database holding ToxCast and Tox21 data, along with results form statistical analyses on these data; the Tox21 chemical repository which is managing the ordering and sample tracking process for the larger Tox21 project; and the public version of ToxRefDB. The public ACToR system contains information on ~500K compounds with toxicology, exposure and chemical property information from >400 public sources. The web site is visited by ~1,000 unique users per month and generates ~1,000 page requests per day on average. The databases are built on open source technology, which has allowed us to export them to a number of col
An open-source framework for large-scale, flexible evaluation of biomedical text mining systems.
Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-29
Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net.
An open-source framework for large-scale, flexible evaluation of biomedical text mining systems
Baumgartner, William A; Cohen, K Bretonnel; Hunter, Lawrence
2008-01-01
Background Improved evaluation methodologies have been identified as a necessary prerequisite to the improvement of text mining theory and practice. This paper presents a publicly available framework that facilitates thorough, structured, and large-scale evaluations of text mining technologies. The extensibility of this framework and its ability to uncover system-wide characteristics by analyzing component parts as well as its usefulness for facilitating third-party application integration are demonstrated through examples in the biomedical domain. Results Our evaluation framework was assembled using the Unstructured Information Management Architecture. It was used to analyze a set of gene mention identification systems involving 225 combinations of system, evaluation corpus, and correctness measure. Interactions between all three were found to affect the relative rankings of the systems. A second experiment evaluated gene normalization system performance using as input 4,097 combinations of gene mention systems and gene mention system-combining strategies. Gene mention system recall is shown to affect gene normalization system performance much more than does gene mention system precision, and high gene normalization performance is shown to be achievable with remarkably low levels of gene mention system precision. Conclusion The software presented in this paper demonstrates the potential for novel discovery resulting from the structured evaluation of biomedical language processing systems, as well as the usefulness of such an evaluation framework for promoting collaboration between developers of biomedical language processing technologies. The code base is available as part of the BioNLP UIMA Component Repository on SourceForge.net. PMID:18230184
RGG: A general GUI Framework for R scripts
Visne, Ilhami; Dilaveroglu, Erkan; Vierlinger, Klemens; Lauss, Martin; Yildiz, Ahmet; Weinhaeusel, Andreas; Noehammer, Christa; Leisch, Friedrich; Kriegner, Albert
2009-01-01
Background R is the leading open source statistics software with a vast number of biostatistical and bioinformatical analysis packages. To exploit the advantages of R, extensive scripting/programming skills are required. Results We have developed a software tool called R GUI Generator (RGG) which enables the easy generation of Graphical User Interfaces (GUIs) for the programming language R by adding a few Extensible Markup Language (XML) – tags. RGG consists of an XML-based GUI definition language and a Java-based GUI engine. GUIs are generated in runtime from defined GUI tags that are embedded into the R script. User-GUI input is returned to the R code and replaces the XML-tags. RGG files can be developed using any text editor. The current version of RGG is available as a stand-alone software (RGGRunner) and as a plug-in for JGR. Conclusion RGG is a general GUI framework for R that has the potential to introduce R statistics (R packages, built-in functions and scripts) to users with limited programming skills and helps to bridge the gap between R developers and GUI-dependent users. RGG aims to abstract the GUI development from individual GUI toolkits by using an XML-based GUI definition language. Thus RGG can be easily integrated in any software. The RGG project further includes the development of a web-based repository for RGG-GUIs. RGG is an open source project licensed under the Lesser General Public License (LGPL) and can be downloaded freely at PMID:19254356
A Data Management Framework for Real-Time Water Quality Monitoring
NASA Astrophysics Data System (ADS)
Mulyono, E.; Yang, D.; Craig, M.
2007-12-01
CSU East Bay operates two in-situ, near-real-time water quality monitoring stations in San Francisco Bay as a member of the Center for Integrative Coastal Ocean Observation, Research, and Education (CICORE) and the Central and Northern California Ocean Observing System (CeNCOOS). We have been operating stations at Dumbarton Pier and San Leandro Marina for the past two years. At each station, a sonde measures seven water quality parameters every six minutes. During the first year of operation, we retrieved data from the sondes every few weeks by visiting the sites and uploading data to a handheld logger. Last year we implemented a telemetry system utilizing a cellular CDMA modem to transfer data from the field to our data center on an hourly basis. Data from each station are initially stored in monthly files in native format. We import data from these files into a SQL database every hour. SQL is handled by Django, an open source web framework. Django provides a user- friendly web user interface (UI) to administer the data. We utilized parts of the Django UI for our database web- front, which allows users to access our database via the World Wide Web and perform basic queries. We also serve our data to other aggregating sites, including the central CICORE website and NOAA's National Data Buoy Center (NDBC). Since Django is written in Python, it allows us to integrate other Python modules into our software, such as the Matplot library for scientific graphics. We store our code in a Subversion repository, which keeps track of software revisions. Code is tested using Python's unittest and doctest modules within Django's testing facility, which warns us when our code modifications cause other parts of the software to break. During the past two years of data acquisition, we have incrementally updated our data model to accommodate changes in physical hardware, including equipment moves, instrument replacements, and sensor upgrades that affected data format.
The HydroServer Platform for Sharing Hydrologic Data
NASA Astrophysics Data System (ADS)
Tarboton, D. G.; Horsburgh, J. S.; Schreuders, K.; Maidment, D. R.; Zaslavsky, I.; Valentine, D. W.
2010-12-01
The CUAHSI Hydrologic Information System (HIS) is an internet based system that supports sharing of hydrologic data. HIS consists of databases connected using the Internet through Web services, as well as software for data discovery, access, and publication. The HIS system architecture is comprised of servers for publishing and sharing data, a centralized catalog to support cross server data discovery and a desktop client to access and analyze data. This paper focuses on HydroServer, the component developed for sharing and publishing space-time hydrologic datasets. A HydroServer is a computer server that contains a collection of databases, web services, tools, and software applications that allow data producers to store, publish, and manage the data from an experimental watershed or project site. HydroServer is designed to permit publication of data as part of a distributed national/international system, while still locally managing access to the data. We describe the HydroServer architecture and software stack, including tools for managing and publishing time series data for fixed point monitoring sites as well as spatially distributed, GIS datasets that describe a particular study area, watershed, or region. HydroServer adopts a standards based approach to data publication, relying on accepted and emerging standards for data storage and transfer. CUAHSI developed HydroServer code is free with community code development managed through the codeplex open source code repository and development system. There is some reliance on widely used commercial software for general purpose and standard data publication capability. The sharing of data in a common format is one way to stimulate interdisciplinary research and collaboration. It is anticipated that the growing, distributed network of HydroServers will facilitate cross-site comparisons and large scale studies that synthesize information from diverse settings, making the network as a whole greater than the sum of its parts in advancing hydrologic research. Details of the CUAHSI HIS can be found at http://his.cuahsi.org, and HydroServer codeplex site http://hydroserver.codeplex.com.
Data processing with microcode designed with source coding
McCoy, James A; Morrison, Steven E
2013-05-07
Programming for a data processor to execute a data processing application is provided using microcode source code. The microcode source code is assembled to produce microcode that includes digital microcode instructions with which to signal the data processor to execute the data processing application.
Adaptive distributed source coding.
Varodayan, David; Lin, Yao-Chung; Girod, Bernd
2012-05-01
We consider distributed source coding in the presence of hidden variables that parameterize the statistical dependence among sources. We derive the Slepian-Wolf bound and devise coding algorithms for a block-candidate model of this problem. The encoder sends, in addition to syndrome bits, a portion of the source to the decoder uncoded as doping bits. The decoder uses the sum-product algorithm to simultaneously recover the source symbols and the hidden statistical dependence variables. We also develop novel techniques based on density evolution (DE) to analyze the coding algorithms. We experimentally confirm that our DE analysis closely approximates practical performance. This result allows us to efficiently optimize parameters of the algorithms. In particular, we show that the system performs close to the Slepian-Wolf bound when an appropriate doping rate is selected. We then apply our coding and analysis techniques to a reduced-reference video quality monitoring system and show a bit rate saving of about 75% compared with fixed-length coding.
Campbell, J R; Carpenter, P; Sneiderman, C; Cohn, S; Chute, C G; Warren, J
1997-01-01
To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for "parent" and "child" codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p < .00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56, UMLS 3.17; READ 2.14, *p < .005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p < .00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p < .004) associated with a loss of clarity. No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. Is suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record.
DOE Office of Scientific and Technical Information (OSTI.GOV)
ECONOMY,KATHLEEN M.; HELTON,JON CRAIG; VAUGHN,PALMER
1999-10-01
The Waste Isolation Pilot Plant (WIPP), which is located in southeastern New Mexico, is being developed for the geologic disposal of transuranic (TRU) waste by the U.S. Department of Energy (DOE). Waste disposal will take place in panels excavated in a bedded salt formation approximately 2000 ft (610 m) below the land surface. The BRAGFLO computer program which solves a system of nonlinear partial differential equations for two-phase flow, was used to investigate brine and gas flow patterns in the vicinity of the repository for the 1996 WIPP performance assessment (PA). The present study examines the implications of modeling assumptionsmore » used in conjunction with BRAGFLO in the 1996 WIPP PA that affect brine and gas flow patterns involving two waste regions in the repository (i.e., a single waste panel and the remaining nine waste panels), a disturbed rock zone (DRZ) that lies just above and below these two regions, and a borehole that penetrates the single waste panel and a brine pocket below this panel. The two waste regions are separated by a panel closure. The following insights were obtained from this study. First, the impediment to flow between the two waste regions provided by the panel closure model is reduced due to the permeable and areally extensive nature of the DRZ adopted in the 1996 WIPP PA, which results in the DRZ becoming an effective pathway for gas and brine movement around the panel closures and thus between the two waste regions. Brine and gas flow between the two waste regions via the DRZ causes pressures between the two to equilibrate rapidly, with the result that processes in the intruded waste panel are not isolated from the rest of the repository. Second, the connection between intruded and unintruded waste panels provided by the DRZ increases the time required for repository pressures to equilibrate with the overlying and/or underlying units subsequent to a drilling intrusion. Third, the large and areally extensive DRZ void volumes is a significant source of brine to the repository, which is consumed in the corrosion of iron and thus contributes to increased repository pressures. Fourth, the DRZ itself lowers repository pressures by providing storage for gas and access to additional gas storage in areas of the repository. Fifth, given the pathway that the DRZ provides for gas and brine to flow around the panel closures, isolation of the waste panels by the panel closures was not essential to compliance with the U.S. Environment Protection Agency's regulations in the 1996 WIPP PA.« less
The Particle Accelerator Simulation Code PyORBIT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gorlov, Timofey V; Holmes, Jeffrey A; Cousineau, Sarah M
2015-01-01
The particle accelerator simulation code PyORBIT is presented. The structure, implementation, history, parallel and simulation capabilities, and future development of the code are discussed. The PyORBIT code is a new implementation and extension of algorithms of the original ORBIT code that was developed for the Spallation Neutron Source accelerator at the Oak Ridge National Laboratory. The PyORBIT code has a two level structure. The upper level uses the Python programming language to control the flow of intensive calculations performed by the lower level code implemented in the C++ language. The parallel capabilities are based on MPI communications. The PyORBIT ismore » an open source code accessible to the public through the Google Open Source Projects Hosting service.« less
BigMouth: a multi-institutional dental data repository.
Walji, Muhammad F; Kalenderian, Elsbeth; Stark, Paul C; White, Joel M; Kookal, Krishna K; Phan, Dat; Tran, Duong; Bernstam, Elmer V; Ramoni, Rachel
2014-01-01
Few oral health databases are available for research and the advancement of evidence-based dentistry. In this work we developed a centralized data repository derived from electronic health records (EHRs) at four dental schools participating in the Consortium of Oral Health Research and Informatics. A multi-stakeholder committee developed a data governance framework that encouraged data sharing while allowing control of contributed data. We adopted the i2b2 data warehousing platform and mapped data from each institution to a common reference terminology. We realized that dental EHRs urgently need to adopt common terminologies. While all used the same treatment code set, only three of the four sites used a common diagnostic terminology, and there were wide discrepancies in how medical and dental histories were documented. BigMouth was successfully launched in August 2012 with data on 1.1 million patients, and made available to users at the contributing institutions. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Coupling the Mixed Potential and Radiolysis Models for Used Fuel Degradation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buck, Edgar C.; Jerden, James L.; Ebert, William L.
The primary purpose of this report is to describe the strategy for coupling three process level models to produce an integrated Used Fuel Degradation Model (FDM). The FDM, which is based on fundamental chemical and physical principals, provides direct calculation of radionuclide source terms for use in repository performance assessments. The G-value for H2O2 production (Gcond) to be used in the Mixed Potential Model (MPM) (H2O2 is the only radiolytic product presently included but others will be added as appropriate) needs to account for intermediate spur reactions. The effects of these intermediate reactions on [H2O2] are accounted for in themore » Radiolysis Model (RM). This report details methods for applying RM calculations that encompass the effects of these fast interactions on [H2O2] as the solution composition evolves during successive MPM iterations and then represent the steady-state [H2O2] in terms of an “effective instantaneous or conditional” generation value (Gcond). It is anticipated that the value of Gcond will change slowly as the reaction progresses through several iterations of the MPM as changes in the nature of fuel surface occur. The Gcond values will be calculated with the RM either after several iterations or when concentrations of key reactants reach threshold values determined from previous sensitivity runs. Sensitivity runs with RM indicate significant changes in G-value can occur over narrow composition ranges. The objective of the mixed potential model (MPM) is to calculate the used fuel degradation rates for a wide range of disposal environments to provide the source term radionuclide release rates for generic repository concepts. The fuel degradation rate is calculated for chemical and oxidative dissolution mechanisms using mixed potential theory to account for all relevant redox reactions at the fuel surface, including those involving oxidants produced by solution radiolysis and provided by the radiolysis model (RM). The RM calculates the concentration of species generated at any specific time and location from the surface of the fuel. Several options being considered for coupling the RM and MPM are described in the report. Different options have advantages and disadvantages based on the extent of coding that would be required and the ease of use of the final product.« less
Bauer, C R K D; Ganslandt, T; Baum, B; Christoph, J; Engel, I; Löbe, M; Mate, S; Stäubert, S; Drepper, J; Prokosch, H-U; Winter, A; Sax, U
2016-01-01
In recent years, research data warehouses moved increasingly into the focus of interest of medical research. Nevertheless, there are only a few center-independent infrastructure solutions available. They aim to provide a consolidated view on medical data from various sources such as clinical trials, electronic health records, epidemiological registries or longitudinal cohorts. The i2b2 framework is a well-established solution for such repositories, but it lacks support for importing and integrating clinical data and metadata. The goal of this project was to develop a platform for easy integration and administration of data from heterogeneous sources, to provide capabilities for linking them to medical terminologies and to allow for transforming and mapping of data streams for user-specific views. A suite of three tools has been developed: the i2b2 Wizard for simplifying administration of i2b2, the IDRT Import and Mapping Tool for loading clinical data from various formats like CSV, SQL, CDISC ODM or biobanks and the IDRT i2b2 Web Client Plugin for advanced export options. The Import and Mapping Tool also includes an ontology editor for rearranging and mapping patient data and structures as well as annotating clinical data with medical terminologies, primarily those used in Germany (ICD-10-GM, OPS, ICD-O, etc.). With the three tools functional, new i2b2-based research projects can be created, populated and customized to researcher's needs in a few hours. Amalgamating data and metadata from different databases can be managed easily. With regards to data privacy a pseudonymization service can be plugged in. Using common ontologies and reference terminologies rather than project-specific ones leads to a consistent understanding of the data semantics. i2b2's promise is to enable clinical researchers to devise and test new hypothesis even without a deep knowledge in statistical programing. The approach presented here has been tested in a number of scenarios with millions of observations and tens of thousands of patients. Initially mostly observant, trained researchers were able to construct new analyses on their own. Early feedback indicates that timely and extensive access to their "own" data is appreciated most, but it is also lowering the barrier for other tasks, for instance checking data quality and completeness (missing data, wrong coding).
2009-09-01
nuclear industry for conducting performance assessment calculations. The analytical FORTRAN code for the DNAPL source function, REMChlor, was...project. The first was to apply existing deterministic codes , such as T2VOC and UTCHEM, to the DNAPL source zone to simulate the remediation processes...but describe the spatial variability of source zones unlike one-dimensional flow and transport codes that assume homogeneity. The Lagrangian models
Chen, Ming; Henry, Nathan; Almsaeed, Abdullah; Zhou, Xiao; Wegrzyn, Jill; Ficklin, Stephen
2017-01-01
Abstract Tripal is an open source software package for developing biological databases with a focus on genetic and genomic data. It consists of a set of core modules that deliver essential functions for loading and displaying data records and associated attributes including organisms, sequence features and genetic markers. Beyond the core modules, community members are encouraged to contribute extension modules to build on the Tripal core and to customize Tripal for individual community needs. To expand the utility of the Tripal software system, particularly for RNASeq data, we developed two new extension modules. Tripal Elasticsearch enables fast, scalable searching of the entire content of a Tripal site as well as the construction of customized advanced searches of specific data types. We demonstrate the use of this module for searching assembled transcripts by functional annotation. A second module, Tripal Analysis Expression, houses and displays records from gene expression assays such as RNA sequencing. This includes biological source materials (biomaterials), gene expression values and protocols used to generate the data. In the case of an RNASeq experiment, this would reflect the individual organisms and tissues used to produce sequencing libraries, the normalized gene expression values derived from the RNASeq data analysis and a description of the software or code used to generate the expression values. The module will load data from common flat file formats including standard NCBI Biosample XML. Data loading, display options and other configurations can be controlled by authorized users in the Drupal administrative backend. Both modules are open source, include usage documentation, and can be found in the Tripal organization’s GitHub repository. Database URL: Tripal Elasticsearch module: https://github.com/tripal/tripal_elasticsearch Tripal Analysis Expression module: https://github.com/tripal/tripal_analysis_expression PMID:29220446
Phase II Evaluation of Clinical Coding Schemes
Campbell, James R.; Carpenter, Paul; Sneiderman, Charles; Cohn, Simon; Chute, Christopher G.; Warren, Judith
1997-01-01
Abstract Objective: To compare three potential sources of controlled clinical terminology (READ codes version 3.1, SNOMED International, and Unified Medical Language System (UMLS) version 1.6) relative to attributes of completeness, clinical taxonomy, administrative mapping, term definitions and clarity (duplicate coding rate). Methods: The authors assembled 1929 source concept records from a variety of clinical information taken from four medical centers across the United States. The source data included medical as well as ample nursing terminology. The source records were coded in each scheme by an investigator and checked by the coding scheme owner. The codings were then scored by an independent panel of clinicians for acceptability. Codes were checked for definitions provided with the scheme. Codes for a random sample of source records were analyzed by an investigator for “parent” and “child” codes within the scheme. Parent and child pairs were scored by an independent panel of medical informatics specialists for clinical acceptability. Administrative and billing code mapping from the published scheme were reviewed for all coded records and analyzed by independent reviewers for accuracy. The investigator for each scheme exhaustively searched a sample of coded records for duplications. Results: SNOMED was judged to be significantly more complete in coding the source material than the other schemes (SNOMED* 70%; READ 57%; UMLS 50%; *p <.00001). SNOMED also had a richer clinical taxonomy judged by the number of acceptable first-degree relatives per coded concept (SNOMED* 4.56; UMLS 3.17; READ 2.14, *p <.005). Only the UMLS provided any definitions; these were found for 49% of records which had a coding assignment. READ and UMLS had better administrative mappings (composite score: READ* 40.6%; UMLS* 36.1%; SNOMED 20.7%, *p <. 00001), and SNOMED had substantially more duplications of coding assignments (duplication rate: READ 0%; UMLS 4.2%; SNOMED* 13.9%, *p <. 004) associated with a loss of clarity. Conclusion: No major terminology source can lay claim to being the ideal resource for a computer-based patient record. However, based upon this analysis of releases for April 1995, SNOMED International is considerably more complete, has a compositional nature and a richer taxonomy. It suffers from less clarity, resulting from a lack of syntax and evolutionary changes in its coding scheme. READ has greater clarity and better mapping to administrative schemes (ICD-10 and OPCS-4), is rapidly changing and is less complete. UMLS is a rich lexical resource, with mappings to many source vocabularies. It provides definitions for many of its terms. However, due to the varying granularities and purposes of its source schemes, it has limitations for representation of clinical concepts within a computer-based patient record. PMID:9147343
NASA Technical Reports Server (NTRS)
Barry, Matthew R.; Osborne, Richard N.
2005-01-01
The RoseDoclet computer program extends the capability of Java doclet software to automatically synthesize Unified Modeling Language (UML) content from Java language source code. [Doclets are Java-language programs that use the doclet application programming interface (API) to specify the content and format of the output of Javadoc. Javadoc is a program, originally designed to generate API documentation from Java source code, now also useful as an extensible engine for processing Java source code.] RoseDoclet takes advantage of Javadoc comments and tags already in the source code to produce a UML model of that code. RoseDoclet applies the doclet API to create a doclet passed to Javadoc. The Javadoc engine applies the doclet to the source code, emitting the output format specified by the doclet. RoseDoclet emits a Rose model file and populates it with fully documented packages, classes, methods, variables, and class diagrams identified in the source code. The way in which UML models are generated can be controlled by use of new Javadoc comment tags that RoseDoclet provides. The advantage of using RoseDoclet is that Javadoc documentation becomes leveraged for two purposes: documenting the as-built API and keeping the design documentation up to date.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alexandrov, Boian S.; Lliev, Filip L.; Stanev, Valentin G.
This code is a toy (short) version of CODE-2016-83. From a general perspective, the code represents an unsupervised adaptive machine learning algorithm that allows efficient and high performance de-mixing and feature extraction of a multitude of non-negative signals mixed and recorded by a network of uncorrelated sensor arrays. The code identifies the number of the mixed original signals and their locations. Further, the code also allows deciphering of signals that have been delayed in regards to the mixing process in each sensor. This code is high customizable and it can be efficiently used for a fast macro-analyses of data. Themore » code is applicable to a plethora of distinct problems: chemical decomposition, pressure transient decomposition, unknown sources/signal allocation, EM signal decomposition. An additional procedure for allocation of the unknown sources is incorporated in the code.« less
Joint Source-Channel Decoding of Variable-Length Codes with Soft Information: A Survey
NASA Astrophysics Data System (ADS)
Guillemot, Christine; Siohan, Pierre
2005-12-01
Multimedia transmission over time-varying wireless channels presents a number of challenges beyond existing capabilities conceived so far for third-generation networks. Efficient quality-of-service (QoS) provisioning for multimedia on these channels may in particular require a loosening and a rethinking of the layer separation principle. In that context, joint source-channel decoding (JSCD) strategies have gained attention as viable alternatives to separate decoding of source and channel codes. A statistical framework based on hidden Markov models (HMM) capturing dependencies between the source and channel coding components sets the foundation for optimal design of techniques of joint decoding of source and channel codes. The problem has been largely addressed in the research community, by considering both fixed-length codes (FLC) and variable-length source codes (VLC) widely used in compression standards. Joint source-channel decoding of VLC raises specific difficulties due to the fact that the segmentation of the received bitstream into source symbols is random. This paper makes a survey of recent theoretical and practical advances in the area of JSCD with soft information of VLC-encoded sources. It first describes the main paths followed for designing efficient estimators for VLC-encoded sources, the key component of the JSCD iterative structure. It then presents the main issues involved in the application of the turbo principle to JSCD of VLC-encoded sources as well as the main approaches to source-controlled channel decoding. This survey terminates by performance illustrations with real image and video decoding systems.
Attempt to model laboratory-scale diffusion and retardation data.
Hölttä, P; Siitari-Kauppi, M; Hakanen, M; Tukiainen, V
2001-02-01
Different approaches for measuring the interaction between radionuclides and rock matrix are needed to test the compatibility of experimental retardation parameters and transport models used in assessing the safety of the underground repositories for the spent nuclear fuel. In this work, the retardation of sodium, calcium and strontium was studied on mica gneiss, unaltered, moderately altered and strongly altered tonalite using dynamic fracture column method. In-diffusion of calcium into rock cubes was determined to predict retardation in columns. In-diffusion of calcium into moderately and strongly altered tonalite was interpreted using a numerical code FTRANS. The code was able to interprete in-diffusion of weakly sorbing calcium into the saturated porous matrix. Elution curves of calcium for the moderately and strongly altered tonalite fracture columns were explained adequately using FTRANS code and parameters obtained from in-diffusion calculations. In this paper, mass distribution ratio values of sodium, calcium and strontium for intact rock are compared to values, previously obtained for crushed rock from batch and crushed rock column experiments. Kd values obtained from fracture column experiments were one order of magnitude lower than Kd values from batch experiments.
LingoBee--Crowd-Sourced Mobile Language Learning in the Cloud
ERIC Educational Resources Information Center
Petersen, Sobah Abbas; Procter-Legg, Emma; Cacchione, Annamaria
2013-01-01
This paper describes three case studies, where language learners were invited to use "LingoBee" as a means of supporting their language learning. LingoBee is a mobile app that provides user-generated language content in a cloud-based shared repository. Assuming that today's students are mobile savvy and "Digital Natives" able…
LingoBee: Engaging Mobile Language Learners through Crowd-Sourcing
ERIC Educational Resources Information Center
Petersen, Sobah Abbas; Procter-Legg, Emma; Cacchione, Annamaria
2014-01-01
This paper describes three case studies, where language learners were invited to use "LingoBee" as a means of supporting their language learning. LingoBee is a mobile app that provides user-generated language content in a cloud-based shared repository. Assuming that today's students are mobile savvy and "Digital Natives" able…
A Culture Model as Mediator and Repository Source for Innovation
ERIC Educational Resources Information Center
Mohammadisadr, Mohammad; Siadat, Seyed Ali; Azizollah, Arbabisarjou; Ebrahim, Ebrahimitabas
2012-01-01
As innovation has become one of the most important competitive advantages, academic practitioner's interest in the matter has increased. But, still the question "why lots of organizations fail in their path of article to be innovative" is remained unanswered. In this, among many factors influencing innovation capacity of an organization;…
USDA-ARS?s Scientific Manuscript database
The elemental content of a soybean seed is a determined by both genetic and environmental factors and is an important component of its nutritional value. The elemental content is stable, making the samples stored in germplasm repositories an intriguing source of experimental material. To test the ef...
Community based research for an urban recreation application of benefits-based management
William T. Borrie; Joseph W. Roggenbuck
1995-01-01
Benefits-based management is an approach to park and recreation management that focuses on the positive outcomes of engaging in recreational experiences. Because one class of possible benefits accrue to the community, a philosophical framework is discussed suggesting that communities are themselves the primary sources, generators, and repositories of knowledge....
National History Day in Arizona 2003 Theme Supplement: Rights and Responsibilities.
ERIC Educational Resources Information Center
Goen, Wendi, Comp.; Devine, Laurie, Comp.
Arizona's archives, libraries, and museums contain a wealth of source material that can be applied to local, regional, and national topics pertaining to the 2003 National History Day theme, rights and responsibilities. Repositories from around the state share ideas and resources that are available to National History Day students. So that…
Code of Federal Regulations, 2013 CFR
2013-01-01
... nuclear material, facility and operator licenses. (a) If the Director, Office of Nuclear Reactor... repository operations area under parts 60 or 63 of this chapter, the Director, Office of Nuclear Reactor Regulation, Director, Office of New Reactors, Director, Office of Nuclear Material Safety and Safeguards, or...
Code of Federal Regulations, 2014 CFR
2014-01-01
... nuclear material, facility and operator licenses. (a) If the Director, Office of Nuclear Reactor... repository operations area under parts 60 or 63 of this chapter, the Director, Office of Nuclear Reactor Regulation, Director, Office of New Reactors, Director, Office of Nuclear Material Safety and Safeguards, or...
NASA Astrophysics Data System (ADS)
Ismail, A. E.; Xiong, Y.; Nowak, E. J.; Brush, L. H.
2009-12-01
The Waste Isolation Pilot Plant (WIPP) is a U.S. Department of Energy (DOE) repository in southeast New Mexico for defense-related transuranic (TRU) waste. Every five years, the DOE is required to submit an application to the Environmental Protection Agency (EPA) demonstrating the WIPP’s continuing compliance with the applicable EPA regulations governing the repository. Part of this recertification effort involves a performance assessment—a probabilistic evaluation of the repository performance with respect to regulatory limits on the amount of releases from the repository to the accessible environment. One of the models used as part of the performance assessment process is a geochemistry model, which predicts solubilities of the radionuclides in the brines that may enter the repository in the different scenarios considered by the performance assessment. The dissolved actinide source term comprises actinide solubilities, which are input parameters for modeling the transport of radionuclides as a result of brine flow through and from the repository. During a performance assessment, the solubilities are modeled as the product of a “base” solubility determined from calculations based on the chemical conditions expected in the repository, and an uncertainty factor that describes the potential deviations of the model from expected behavior. We will focus here on a discussion of the uncertainties. To compute a cumulative distribution function (CDF) for the uncertainties, we compare published, experimentally measured solubility data to predictions made using the established WIPP geochemistry model. The differences between the solubilities observed for a given experiment and the calculated solubilities from the model are used to form the overall CDF, which is then sampled as part of the performance assessment. We will discuss the methodology used to update the CDF’s for the +III actinides, obtained from data for Nd, Am, and Cm, and the +IV actinides, obtained from data for Th, and present results for the calculations of the updated CDF’s. We compare the CDF’s to the distributions computed for the previous recertification, and discuss the potential impact of the changes on the geochemistry model. This research is funded by WIPP programs administered by the U.S. Department of Energy. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Open-Source Development of the Petascale Reactive Flow and Transport Code PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Andre, B.; Bisht, G.; Johnson, T.; Karra, S.; Lichtner, P. C.; Mills, R. T.
2013-12-01
Open-source software development has become increasingly popular in recent years. Open-source encourages collaborative and transparent software development and promotes unlimited free redistribution of source code to the public. Open-source development is good for science as it reveals implementation details that are critical to scientific reproducibility, but generally excluded from journal publications. In addition, research funds that would have been spent on licensing fees can be redirected to code development that benefits more scientists. In 2006, the developers of PFLOTRAN open-sourced their code under the U.S. Department of Energy SciDAC-II program. Since that time, the code has gained popularity among code developers and users from around the world seeking to employ PFLOTRAN to simulate thermal, hydraulic, mechanical and biogeochemical processes in the Earth's surface/subsurface environment. PFLOTRAN is a massively-parallel subsurface reactive multiphase flow and transport simulator designed from the ground up to run efficiently on computing platforms ranging from the laptop to leadership-class supercomputers, all from a single code base. The code employs domain decomposition for parallelism and is founded upon the well-established and open-source parallel PETSc and HDF5 frameworks. PFLOTRAN leverages modern Fortran (i.e. Fortran 2003-2008) in its extensible object-oriented design. The use of this progressive, yet domain-friendly programming language has greatly facilitated collaboration in the code's software development. Over the past year, PFLOTRAN's top-level data structures were refactored as Fortran classes (i.e. extendible derived types) to improve the flexibility of the code, ease the addition of new process models, and enable coupling to external simulators. For instance, PFLOTRAN has been coupled to the parallel electrical resistivity tomography code E4D to enable hydrogeophysical inversion while the same code base can be used as a third-party library to provide hydrologic flow, energy transport, and biogeochemical capability to the community land model, CLM, part of the open-source community earth system model (CESM) for climate. In this presentation, the advantages and disadvantages of open source software development in support of geoscience research at government laboratories, universities, and the private sector are discussed. Since the code is open-source (i.e. it's transparent and readily available to competitors), the PFLOTRAN team's development strategy within a competitive research environment is presented. Finally, the developers discuss their approach to object-oriented programming and the leveraging of modern Fortran in support of collaborative geoscience research as the Fortran standard evolves among compiler vendors.
Multidimensional incremental parsing for universal source coding.
Bae, Soo Hyun; Juang, Biing-Hwang
2008-10-01
A multidimensional incremental parsing algorithm (MDIP) for multidimensional discrete sources, as a generalization of the Lempel-Ziv coding algorithm, is investigated. It consists of three essential component schemes, maximum decimation matching, hierarchical structure of multidimensional source coding, and dictionary augmentation. As a counterpart of the longest match search in the Lempel-Ziv algorithm, two classes of maximum decimation matching are studied. Also, an underlying behavior of the dictionary augmentation scheme for estimating the source statistics is examined. For an m-dimensional source, m augmentative patches are appended into the dictionary at each coding epoch, thus requiring the transmission of a substantial amount of information to the decoder. The property of the hierarchical structure of the source coding algorithm resolves this issue by successively incorporating lower dimensional coding procedures in the scheme. In regard to universal lossy source coders, we propose two distortion functions, the local average distortion and the local minimax distortion with a set of threshold levels for each source symbol. For performance evaluation, we implemented three image compression algorithms based upon the MDIP; one is lossless and the others are lossy. The lossless image compression algorithm does not perform better than the Lempel-Ziv-Welch coding, but experimentally shows efficiency in capturing the source structure. The two lossy image compression algorithms are implemented using the two distortion functions, respectively. The algorithm based on the local average distortion is efficient at minimizing the signal distortion, but the images by the one with the local minimax distortion have a good perceptual fidelity among other compression algorithms. Our insights inspire future research on feature extraction of multidimensional discrete sources.
An Efficient Variable Length Coding Scheme for an IID Source
NASA Technical Reports Server (NTRS)
Cheung, K. -M.
1995-01-01
A scheme is examined for using two alternating Huffman codes to encode a discrete independent and identically distributed source with a dominant symbol. This combined strategy, or alternating runlength Huffman (ARH) coding, was found to be more efficient than ordinary coding in certain circumstances.
Source Code Plagiarism--A Student Perspective
ERIC Educational Resources Information Center
Joy, M.; Cosma, G.; Yau, J. Y.-K.; Sinclair, J.
2011-01-01
This paper considers the problem of source code plagiarism by students within the computing disciplines and reports the results of a survey of students in Computing departments in 18 institutions in the U.K. This survey was designed to investigate how well students understand the concept of source code plagiarism and to discover what, if any,…
Sajjad, Muhammad; Mehmood, Irfan; Baik, Sung Wook
2017-01-01
Medical image collections contain a wealth of information which can assist radiologists and medical experts in diagnosis and disease detection for making well-informed decisions. However, this objective can only be realized if efficient access is provided to semantically relevant cases from the ever-growing medical image repositories. In this paper, we present an efficient method for representing medical images by incorporating visual saliency and deep features obtained from a fine-tuned convolutional neural network (CNN) pre-trained on natural images. Saliency detector is employed to automatically identify regions of interest like tumors, fractures, and calcified spots in images prior to feature extraction. Neuronal activation features termed as neural codes from different CNN layers are comprehensively studied to identify most appropriate features for representing radiographs. This study revealed that neural codes from the last fully connected layer of the fine-tuned CNN are found to be the most suitable for representing medical images. The neural codes extracted from the entire image and salient part of the image are fused to obtain the saliency-injected neural codes (SiNC) descriptor which is used for indexing and retrieval. Finally, locality sensitive hashing techniques are applied on the SiNC descriptor to acquire short binary codes for allowing efficient retrieval in large scale image collections. Comprehensive experimental evaluations on the radiology images dataset reveal that the proposed framework achieves high retrieval accuracy and efficiency for scalable image retrieval applications and compares favorably with existing approaches. PMID:28771497
Recent advances in coding theory for near error-free communications
NASA Technical Reports Server (NTRS)
Cheung, K.-M.; Deutsch, L. J.; Dolinar, S. J.; Mceliece, R. J.; Pollara, F.; Shahshahani, M.; Swanson, L.
1991-01-01
Channel and source coding theories are discussed. The following subject areas are covered: large constraint length convolutional codes (the Galileo code); decoder design (the big Viterbi decoder); Voyager's and Galileo's data compression scheme; current research in data compression for images; neural networks for soft decoding; neural networks for source decoding; finite-state codes; and fractals for data compression.
Hybrid concatenated codes and iterative decoding
NASA Technical Reports Server (NTRS)
Divsalar, Dariush (Inventor); Pollara, Fabrizio (Inventor)
2000-01-01
Several improved turbo code apparatuses and methods. The invention encompasses several classes: (1) A data source is applied to two or more encoders with an interleaver between the source and each of the second and subsequent encoders. Each encoder outputs a code element which may be transmitted or stored. A parallel decoder provides the ability to decode the code elements to derive the original source information d without use of a received data signal corresponding to d. The output may be coupled to a multilevel trellis-coded modulator (TCM). (2) A data source d is applied to two or more encoders with an interleaver between the source and each of the second and subsequent encoders. Each of the encoders outputs a code element. In addition, the original data source d is output from the encoder. All of the output elements are coupled to a TCM. (3) At least two data sources are applied to two or more encoders with an interleaver between each source and each of the second and subsequent encoders. The output may be coupled to a TCM. (4) At least two data sources are applied to two or more encoders with at least two interleavers between each source and each of the second and subsequent encoders. (5) At least one data source is applied to one or more serially linked encoders through at least one interleaver. The output may be coupled to a TCM. The invention includes a novel way of terminating a turbo coder.
Recent Developments in the Code RITRACKS (Relativistic Ion Tracks)
NASA Technical Reports Server (NTRS)
Plante, Ianik; Ponomarev, Artem L.; Blattnig, Steve R.
2018-01-01
The code RITRACKS (Relativistic Ion Tracks) was developed to simulate detailed stochastic radiation track structures of ions of different types and energies. Many new capabilities were added to the code during the recent years. Several options were added to specify the times at which the tracks appear in the irradiated volume, allowing the simulation of dose-rate effects. The code has been used to simulate energy deposition in several targets: spherical, ellipsoidal and cylindrical. More recently, density changes as well as a spherical shell were implemented for spherical targets, in order to simulate energy deposition in walled tissue equivalent proportional counters. RITRACKS is used as a part of the new program BDSTracks (Biological Damage by Stochastic Tracks) to simulate several types of chromosome aberrations in various irradiation conditions. The simulation of damage to various DNA structures (linear and chromatin fiber) by direct and indirect effects has been improved and is ongoing. Many improvements were also made to the graphic user interface (GUI), including the addition of several labels allowing changes of units. A new GUI has been added to display the electron ejection vectors. The parallel calculation capabilities, notably the pre- and post-simulation processing on Windows and Linux machines have been reviewed to make them more portable between different systems. The calculation part is currently maintained in an Atlassian Stash® repository for code tracking and possibly future collaboration.
Gschwind, Michael K
2013-07-23
Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.
Torres, Leticia; Hu, E.; Tiersch, Terrence R.
2017-01-01
Cryopreservation in aquatic species in general has been constrained to research activities for more than 60 years. Although the need for application and commercialisation pathways has become clear, the lack of comprehensive quality assurance and quality control programs has impeded the progress of the field, delaying the establishment of germplasm repositories and commercial-scale applications. In this review we focus on the opportunities for standardisation in the practices involved in the four main stages of the cryopreservation process: (1) source, housing and conditioning of fish; (2) sample collection and preparation; (3) freezing and cryogenic storage of samples; and (4) egg collection and use of thawed sperm samples. In addition, we introduce some key factors that would assist the transition to commercial-scale, high-throughput application. PMID:26739583
Reference commercial high-level waste glass and canister definition
NASA Astrophysics Data System (ADS)
Slate, S. C.; Ross, W. A.; Partain, W. L.
1981-09-01
Technical data and performance characteristics of a high level waste glass and canister intended for use in the design of a complete waste encapsulation package suitable for disposal in a geologic repository are presented. The borosilicate glass contained in the stainless steel canister represents the probable type of high level waste product that is produced in a commercial nuclear-fuel reprocessing plant. Development history is summarized for high level liquid waste compositions, waste glass composition and characteristics, and canister design. The decay histories of the fission products and actinides (plus daughters) calculated by the ORIGEN-II code are presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zehtabian, M; Zaker, N; Sina, S
2015-06-15
Purpose: Different versions of MCNP code are widely used for dosimetry purposes. The purpose of this study is to compare different versions of the MCNP codes in dosimetric evaluation of different brachytherapy sources. Methods: The TG-43 parameters such as dose rate constant, radial dose function, and anisotropy function of different brachytherapy sources, i.e. Pd-103, I-125, Ir-192, and Cs-137 were calculated in water phantom. The results obtained by three versions of Monte Carlo codes (MCNP4C, MCNPX, MCNP5) were compared for low and high energy brachytherapy sources. Then the cross section library of MCNP4C code was changed to ENDF/B-VI release 8 whichmore » is used in MCNP5 and MCNPX codes. Finally, the TG-43 parameters obtained using the MCNP4C-revised code, were compared with other codes. Results: The results of these investigations indicate that for high energy sources, the differences in TG-43 parameters between the codes are less than 1% for Ir-192 and less than 0.5% for Cs-137. However for low energy sources like I-125 and Pd-103, large discrepancies are observed in the g(r) values obtained by MCNP4C and the two other codes. The differences between g(r) values calculated using MCNP4C and MCNP5 at the distance of 6cm were found to be about 17% and 28% for I-125 and Pd-103 respectively. The results obtained with MCNP4C-revised and MCNPX were similar. However, the maximum difference between the results obtained with the MCNP5 and MCNP4C-revised codes was 2% at 6cm. Conclusion: The results indicate that using MCNP4C code for dosimetry of low energy brachytherapy sources can cause large errors in the results. Therefore it is recommended not to use this code for low energy sources, unless its cross section library is changed. Since the results obtained with MCNP4C-revised and MCNPX were similar, it is concluded that the difference between MCNP4C and MCNPX is their cross section libraries.« less
Looking for Skeletons in the Data Centre `Cupboard': How Repository Certification Can Help
NASA Astrophysics Data System (ADS)
Sorvari, S.; Glaves, H.
2017-12-01
There has been a national geoscience repository at the British Geological Survey (or one of its previous incarnations) almost since its inception in 1835. This longevity has resulted in vast amounts of analogue material and, more recently, digital data some of which has been collected by our scientists but much more has been acquired either through various legislative obligations or donated from various sources. However, the role and operation of the UK National Geoscience Data Centre (NGDC) in the 21st Century is very different to that of the past, with new systems and procedures dealing with predominantly digital data. A web-based ingestion portal allows users to submit their data directly to the NGDC while online services provide discovery and access to data and derived products. Increasingly we are also required to implement an array of standards e.g. ISO, OGC, W3C, best practices e.g. FAIR and legislation e.g. EU INSPIRE Directive; whilst at the same time needing to justifying our very existence to our funding agency and hosting organisation. External pressures to demonstrate that we can be recognised as a trusted repository by researchers, various funding agencies, publishers and other related entities have forced us to look at how we function, and to benchmark our operations against those of other organisations and current relevant standards such as those laid down by different repository certification processes. Following an assessment of the various options, the WDS/DSA certification process was selected as the most appropriate route for accreditation of NGDC as a trustworthy repository. It provided a suitable framework for reviewing the current systems, procedures and best practices. Undertaking this process allowed us to identify where the NGDC already has robust systems in place and where there were gaps and deficiencies in current practices. The WDS/DSA assessment process also helped to reinforce best practice throughout the NGDC and demonstrated that many of the recognised and required procedures and standards for recognition as a trusted repository were already in place, even if they were not always followed!
Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments
ERIC Educational Resources Information Center
Kermek, Dragutin; Novak, Matija
2016-01-01
In programming courses there are various ways in which students attempt to cheat. The most commonly used method is copying source code from other students and making minimal changes in it, like renaming variable names. Several tools like Sherlock, JPlag and Moss have been devised to detect source code plagiarism. However, for larger student…
Detecting people of interest from internet data sources
NASA Astrophysics Data System (ADS)
Cardillo, Raymond A.; Salerno, John J.
2006-04-01
In previous papers, we have documented success in determining the key people of interest from a large corpus of real-world evidence. Our recent efforts focus on exploring additional domains and data sources. Internet data sources such as email, web pages, and news feeds make it easier to gather a large corpus of documents for various domains, but detecting people of interest in these sources introduces new challenges. Analyzing these massive sources magnifies entity resolution problems, and demands a storage management strategy that supports efficient algorithmic analysis and visualization techniques. This paper discusses the techniques we used in order to analyze the ENRON email repository, which are also applicable to analyzing web pages returned from our "Buddy" meta-search engine.
Fowler, G E; Baker, D M; Lee, M J; Brown, S R
2017-11-01
The internet is becoming an increasingly popular resource to support patient decision-making outside of the clinical encounter. The quality of online health information is variable and largely unregulated. The aim of this study was to assess the quality of online resources to support patient decision-making for full-thickness rectal prolapse surgery. This systematic review was registered on the PROSPERO database (CRD42017058319). Searches were performed on Google and specialist decision aid repositories using a pre-defined search strategy. Sources were analysed according to three measures: (1) their readability using the Flesch-Kincaid Reading Ease score, (2) DISCERN score and (3) International Patient Decision Aids Standards (IPDAS) minimum standards criteria score (IPDASi, v4.0). Overall, 95 sources were from Google and the specialist decision aid repositories. There were 53 duplicates removed, and 18 sources did not meet the pre-defined eligibility criteria, leaving 24 sources included in the full-text analysis. The mean Flesch-Kincaid Reading Ease score was higher than recommended for patient education materials (48.8 ± 15.6, range 25.2-85.3). Overall quality of sources supporting patient decision-making for full-thickness rectal prolapse surgery was poor (median DISCERN score 1/5 ± 1.18, range 1-5). No sources met minimum decision-making standards (median IPDASi score 5/12 ± 2.01, range 1-8). Currently, easily accessible online health information to support patient decision-making for rectal surgery is of poor quality, difficult to read and does not support shared decision-making. It is recommended that professional bodies and medical professionals seek to develop decision aids to support decision-making for full-thickness rectal prolapse surgery.
Finding Resolution for the Responsible Transparency of Economic Models in Health and Medicine.
Padula, William V; McQueen, Robert Brett; Pronovost, Peter J
2017-11-01
The Second Panel on Cost-Effectiveness in Health and Medicine recommendations for conduct, methodological practices, and reporting of cost-effectiveness analyses has a number of questions unanswered with respect to the implementation of transparent, open source code interface for economic models. The possibility of making economic model source code could be positive and progressive for the field; however, several unintended consequences of this system should be first considered before complete implementation of this model. First, there is the concern regarding intellectual property rights that modelers have to their analyses. Second, the open source code could make analyses more accessible to inexperienced modelers, leading to inaccurate or misinterpreted results. We propose several resolutions to these concerns. The field should establish a licensing system of open source code such that the model originators maintain control of the code use and grant permissions to other investigators who wish to use it. The field should also be more forthcoming towards the teaching of cost-effectiveness analysis in medical and health services education so that providers and other professionals are familiar with economic modeling and able to conduct analyses with open source code. These types of unintended consequences need to be fully considered before the field's preparedness to move forward into an era of model transparency with open source code.
40 CFR 51.50 - What definitions apply to this subpart?
Code of Federal Regulations, 2010 CFR
2010-07-01
... accuracy description (MAD) codes means a set of six codes used to define the accuracy of latitude/longitude data for point sources. The six codes and their definitions are: (1) Coordinate Data Source Code: The... physical piece of or a closely related set of equipment. The EPA's reporting format for a given inventory...
The Astrophysics Source Code Library by the numbers
NASA Astrophysics Data System (ADS)
Allen, Alice; Teuben, Peter; Berriman, G. Bruce; DuPrie, Kimberly; Mink, Jessica; Nemiroff, Robert; Ryan, PW; Schmidt, Judy; Shamir, Lior; Shortridge, Keith; Wallin, John; Warmels, Rein
2018-01-01
The Astrophysics Source Code Library (ASCL, ascl.net) was founded in 1999 by Robert Nemiroff and John Wallin. ASCL editors seek both new and old peer-reviewed papers that describe methods or experiments that involve the development or use of source code, and add entries for the found codes to the library. Software authors can submit their codes to the ASCL as well. This ensures a comprehensive listing covering a significant number of the astrophysics source codes used in peer-reviewed studies. The ASCL is indexed by both NASA’s Astrophysics Data System (ADS) and Web of Science, making software used in research more discoverable. This presentation covers the growth in the ASCL’s number of entries, the number of citations to its entries, and in which journals those citations appear. It also discusses what changes have been made to the ASCL recently, and what its plans are for the future.
Astrophysics Source Code Library: Incite to Cite!
NASA Astrophysics Data System (ADS)
DuPrie, K.; Allen, A.; Berriman, B.; Hanisch, R. J.; Mink, J.; Nemiroff, R. J.; Shamir, L.; Shortridge, K.; Taylor, M. B.; Teuben, P.; Wallen, J. F.
2014-05-01
The Astrophysics Source Code Library (ASCl,http://ascl.net/) is an on-line registry of over 700 source codes that are of interest to astrophysicists, with more being added regularly. The ASCL actively seeks out codes as well as accepting submissions from the code authors, and all entries are citable and indexed by ADS. All codes have been used to generate results published in or submitted to a refereed journal and are available either via a download site or from an identified source. In addition to being the largest directory of scientist-written astrophysics programs available, the ASCL is also an active participant in the reproducible research movement with presentations at various conferences, numerous blog posts and a journal article. This poster provides a description of the ASCL and the changes that we are starting to see in the astrophysics community as a result of the work we are doing.
Astrophysics Source Code Library
NASA Astrophysics Data System (ADS)
Allen, A.; DuPrie, K.; Berriman, B.; Hanisch, R. J.; Mink, J.; Teuben, P. J.
2013-10-01
The Astrophysics Source Code Library (ASCL), founded in 1999, is a free on-line registry for source codes of interest to astronomers and astrophysicists. The library is housed on the discussion forum for Astronomy Picture of the Day (APOD) and can be accessed at http://ascl.net. The ASCL has a comprehensive listing that covers a significant number of the astrophysics source codes used to generate results published in or submitted to refereed journals and continues to grow. The ASCL currently has entries for over 500 codes; its records are citable and are indexed by ADS. The editors of the ASCL and members of its Advisory Committee were on hand at a demonstration table in the ADASS poster room to present the ASCL, accept code submissions, show how the ASCL is starting to be used by the astrophysics community, and take questions on and suggestions for improving the resource.
Generating code adapted for interlinking legacy scalar code and extended vector code
Gschwind, Michael K
2013-06-04
Mechanisms for intermixing code are provided. Source code is received for compilation using an extended Application Binary Interface (ABI) that extends a legacy ABI and uses a different register configuration than the legacy ABI. First compiled code is generated based on the source code, the first compiled code comprising code for accommodating the difference in register configurations used by the extended ABI and the legacy ABI. The first compiled code and second compiled code are intermixed to generate intermixed code, the second compiled code being compiled code that uses the legacy ABI. The intermixed code comprises at least one call instruction that is one of a call from the first compiled code to the second compiled code or a call from the second compiled code to the first compiled code. The code for accommodating the difference in register configurations is associated with the at least one call instruction.
The Function Biomedical Informatics Research Network Data Repository
Keator, David B.; van Erp, Theo G.M.; Turner, Jessica A.; Glover, Gary H.; Mueller, Bryon A.; Liu, Thomas T.; Voyvodic, James T.; Rasmussen, Jerod; Calhoun, Vince D.; Lee, Hyo Jong; Toga, Arthur W.; McEwen, Sarah; Ford, Judith M.; Mathalon, Daniel H.; Diaz, Michele; O’Leary, Daniel S.; Bockholt, H. Jeremy; Gadde, Syam; Preda, Adrian; Wible, Cynthia G.; Stern, Hal S.; Belger, Aysenil; McCarthy, Gregory; Ozyurt, Burak; Potkin, Steven G.
2015-01-01
The Function Biomedical Informatics Research Network (FBIRN) developed methods and tools for conducting multi-scanner functional magnetic resonance imaging (fMRI) studies. Method and tool development were based on two major goals: 1) to assess the major sources of variation in fMRI studies conducted across scanners, including instrumentation, acquisition protocols, challenge tasks, and analysis methods, and 2) to provide a distributed network infrastructure and an associated federated database to host and query large, multi-site, fMRI and clinical datasets. In the process of achieving these goals the FBIRN test bed generated several multi-scanner brain imaging data sets to be shared with the wider scientific community via the BIRN Data Repository (BDR). The FBIRN Phase 1 dataset consists of a traveling subject study of 5 healthy subjects, each scanned on 10 different 1.5 to 4 Tesla scanners. The FBIRN Phase 2 and Phase 3 datasets consist of subjects with schizophrenia or schizoaffective disorder along with healthy comparison subjects scanned at multiple sites. In this paper, we provide concise descriptions of FBIRN’s multi-scanner brain imaging data sets and details about the BIRN Data Repository instance of the Human Imaging Database (HID) used to publicly share the data. PMID:26364863
HepSEQ: International Public Health Repository for Hepatitis B
Gnaneshan, Saravanamuttu; Ijaz, Samreen; Moran, Joanne; Ramsay, Mary; Green, Jonathan
2007-01-01
HepSEQ is a repository for an extensive library of public health and molecular data relating to hepatitis B virus (HBV) infection collected from international sources. It is hosted by the Centre for Infections, Health Protection Agency (HPA), England, United Kingdom. This repository has been developed as a web-enabled, quality-controlled database to act as a tool for surveillance, HBV case management and for research. The web front-end for the database system can be accessed from . The format of the database system allows for comprehensive molecular, clinical and epidemiological data to be deposited into a functional database, to search and manipulate the stored data and to extract and visualize the information on epidemiological, virological, clinical, nucleotide sequence and mutational aspects of HBV infection through web front-end. Specific tools, built into the database, can be utilized to analyse deposited data and provide information on HBV genotype, identify mutations with known clinical significance (e.g. vaccine escape, precore and antiviral-resistant mutations) and carry out sequence homology searches against other deposited strains. Further mechanisms are also in place to allow specific tailored searches of the database to be undertaken. PMID:17130143
Optimal power allocation and joint source-channel coding for wireless DS-CDMA visual sensor networks
NASA Astrophysics Data System (ADS)
Pandremmenou, Katerina; Kondi, Lisimachos P.; Parsopoulos, Konstantinos E.
2011-01-01
In this paper, we propose a scheme for the optimal allocation of power, source coding rate, and channel coding rate for each of the nodes of a wireless Direct Sequence Code Division Multiple Access (DS-CDMA) visual sensor network. The optimization is quality-driven, i.e. the received quality of the video that is transmitted by the nodes is optimized. The scheme takes into account the fact that the sensor nodes may be imaging scenes with varying levels of motion. Nodes that image low-motion scenes will require a lower source coding rate, so they will be able to allocate a greater portion of the total available bit rate to channel coding. Stronger channel coding will mean that such nodes will be able to transmit at lower power. This will both increase battery life and reduce interference to other nodes. Two optimization criteria are considered. One that minimizes the average video distortion of the nodes and one that minimizes the maximum distortion among the nodes. The transmission powers are allowed to take continuous values, whereas the source and channel coding rates can assume only discrete values. Thus, the resulting optimization problem lies in the field of mixed-integer optimization tasks and is solved using Particle Swarm Optimization. Our experimental results show the importance of considering the characteristics of the video sequences when determining the transmission power, source coding rate and channel coding rate for the nodes of the visual sensor network.
An open real-time tele-stethoscopy system
2012-01-01
Background Acute respiratory infections are the leading cause of childhood mortality. The lack of physicians in rural areas of developing countries makes difficult their correct diagnosis and treatment. The staff of rural health facilities (health-care technicians) may not be qualified to distinguish respiratory diseases by auscultation. For this reason, the goal of this project is the development of a tele-stethoscopy system that allows a physician to receive real-time cardio-respiratory sounds from a remote auscultation, as well as video images showing where the technician is placing the stethoscope on the patient’s body. Methods A real-time wireless stethoscopy system was designed. The initial requirements were: 1) The system must send audio and video synchronously over IP networks, not requiring an Internet connection; 2) It must preserve the quality of cardiorespiratory sounds, allowing to adapt the binaural pieces and the chestpiece of standard stethoscopes, and; 3) Cardiorespiratory sounds should be recordable at both sides of the communication. In order to verify the diagnostic capacity of the system, a clinical validation with eight specialists has been designed. In a preliminary test, twelve patients have been auscultated by all the physicians using the tele-stethoscopy system, versus a local auscultation using traditional stethoscope. The system must allow listen the cardiac (systolic and diastolic murmurs, gallop sound, arrhythmias) and respiratory (rhonchi, rales and crepitations, wheeze, diminished and bronchial breath sounds, pleural friction rub) sounds. Results The design, development and initial validation of the real-time wireless tele-stethoscopy system are described in detail. The system was conceived from scratch as open-source, low-cost and designed in such a way that many universities and small local companies in developing countries may manufacture it. Only free open-source software has been used in order to minimize manufacturing costs and look for alliances to support its improvement and adaptation. The microcontroller firmware code, the computer software code and the PCB schematics are available for free download in a subversion repository hosted in SourceForge. Conclusions It has been shown that real-time tele-stethoscopy, together with a videoconference system that allows a remote specialist to oversee the auscultation, may be a very helpful tool in rural areas of developing countries. PMID:22917062
Creativity and Mobile Language Learning Using LingoBee
ERIC Educational Resources Information Center
Petersen, Sobah Abbas; Procter-Legg, Emma; Cacchione, Annamaria
2013-01-01
In this paper, the authors explore the ideas of mobility and creativity through the use of LingoBee, a mobile app for situated language learning. LingoBee is based on ideas from crowd-sourcing and social networking to support language learners. Learners are able to create their own content and share it with other learners through a repository. The…
Robert E. Means
2011-01-01
Lower treeline limber pine woodlands have received little attention in peer-reviewed literature and in management strategies. These ecologically distinct systems are thought to be seed repositories between discontinuous populations in the northern and central Rocky Mountains, serving as seed sources for bird dispersal between distinct mountain ranges. Their position on...
Anthropogenic sources of carbon from landfill or waste leachate can promote reductive dissolution of in situ arsenic (As) and enhance the mobility of As in groundwater. Groundwater from residential-supply wells in a fractured crystalline-rock aquifer adjacent to a Superfund site ...
The PubChem Bioassay database is a non-curated public repository with bioactivity data from 64 sources, including: ChEMBL, BindingDb, DrugBank, Tox21, NIH Molecular Libraries Screening Program, and various academic, government, and industrial contributors. However, this data is d...
Siegel, M.D.; Anderholm, S.
1994-01-01
The Culebra Dolomite Member of the Rustler Formation, a thin (10 m) fractured dolomite aquifer, lies approximately 450 m above the repository horizon of the Waste Isolation Pilot Plant (WIPP) in southeastern New Mexico, USA. Salinities of water in the Culebra range roughly from 10,000 to 200,000 mg/L within the WIPP site. A proposed model for the post-Pleistocene hydrochemical evolution of the Culebra tentatively identifies the major sources and sinks for many of the groundwater solutes. Reaction-path simulations with the PHRQPITZ code suggest that the Culebra dolomite is a partial chemical equilibrium system whose composition is controlled by an irreversible process (dissolution of evaporites) and equilibrium with gypsum and calcite. Net geochemical reactions along postulated modern flow paths, calculated with the NETPATH code, include dissolution of halite, carbonate and evaporite salts, and ion exchange. R-mode principal component analysis revealed correlations among the concentrations of Si, Mg, pH, Li, and B that are consistent with several clay-water reactions. The results of the geochemical calculations and mineralogical data are consistent with the following hydrochemical model: 1. (1) solutes are added to the Culebra by dissolution of evaporite minerals 2. (2) the solubilities of gypsum and calcite increase as the salinity increases; these minerals dissolve as chemical equilibrium is maintained between them and the groundwater 3. (3) equilibrium is not maintained between the waters and dolomite; sufficient Mg is added to the waters by dissolution of accessory carnallite or polyhalite such that the degree of dolomite supersaturation increases with ionic strength 4. (4) clays within the fractures and rock matrix exert some control on the distribution of Li, B, Mg, and Si via sorption, ion exchange, and dissolution. ?? 1994.
NASA Astrophysics Data System (ADS)
Vines, Aleksander; Hamre, Torill; Lygre, Kjetil
2014-05-01
The GreenSeas project (Development of global plankton data base and model system for eco-climate early warning) aims to advance the knowledge and predictive capacities of how marine ecosystems will respond to global change. A main task has been to set up a data delivery and monitoring core service following the open and free data access policy implemented in the Global Monitoring for the Environment and Security (GMES) programme. The aim is to ensure open and free access to historical plankton data, new data (EO products and in situ measurements), model data (including estimates of simulation error) and biological, environmental and climatic indicators to a range of stakeholders, such as scientists, policy makers and environmental managers. To this end, we have developed a geo-spatial database of both historical and new in situ physical, biological and chemical parameters for the Southern Ocean, Atlantic, Nordic Seas and the Arctic, and organized related satellite-derived quantities and model forecasts in a joint geo-spatial repository. For easy access to these data, we have implemented a web-based GIS (Geographical Information Systems) where observed, derived and forcasted parameters can be searched, displayed, compared and exported. Model forecasts can also be uploaded dynamically to the system, to allow modelers to quickly compare their results with available in situ and satellite observations. We have implemented the web-based GIS(Geographical Information Systems) system based on free and open source technologies: Thredds Data Server, ncWMS, GeoServer, OpenLayers, PostGIS, Liferay, Apache Tomcat, PRTree, NetCDF-Java, json-simple, Geotoolkit, Highcharts, GeoExt, MapFish, FileSaver, jQuery, jstree and qUnit. We also wanted to used open standards to communicate between the different services and we use WMS, WFS, netCDF, GML, OPeNDAP, JSON, and SLD. The main advantage we got from using FOSS was that we did not have to invent the wheel all over again, but could use already existing code and functionalities on our software for free: Of course most the software did not have to be open source for this, but in some cases we had to do minor modifications to make the different technologies work together. We could extract the parts of the code that we needed for a specific task. One example of this was to use part of the code from ncWMS and Thredds to help our main application to both read netCDF files and present them in the browser. This presentation will focus on both difficulties we had with and advantages we got from developing this tool with FOSS.
Data compression for satellite images
NASA Technical Reports Server (NTRS)
Chen, P. H.; Wintz, P. A.
1976-01-01
An efficient data compression system is presented for satellite pictures and two grey level pictures derived from satellite pictures. The compression techniques take advantages of the correlation between adjacent picture elements. Several source coding methods are investigated. Double delta coding is presented and shown to be the most efficient. Both predictive differential quantizing technique and double delta coding can be significantly improved by applying a background skipping technique. An extension code is constructed. This code requires very little storage space and operates efficiently. Simulation results are presented for various coding schemes and source codes.
DASMiner: discovering and integrating data from DAS sources
2009-01-01
Background DAS is a widely adopted protocol for providing syntactic interoperability among biological databases. The popularity of DAS is due to a simplified and elegant mechanism for data exchange that consists of sources exposing their RESTful interfaces for data access. As a growing number of DAS services are available for molecular biology resources, there is an incentive to explore this protocol in order to advance data discovery and integration among these resources. Results We developed DASMiner, a Matlab toolkit for querying DAS data sources that enables creation of integrated biological models using the information available in DAS-compliant repositories. DASMiner is composed by a browser application and an API that work together to facilitate gathering of data from different DAS sources, which can be used for creating enriched datasets from multiple sources. The browser is used to formulate queries and navigate data contained in DAS sources. Users can execute queries against these sources in an intuitive fashion, without the need of knowing the specific DAS syntax for the particular source. Using the source's metadata provided by the DAS Registry, the browser's layout adapts to expose only the set of commands and coordinate systems supported by the specific source. For this reason, the browser can interrogate any DAS source, independently of the type of data being served. The API component of DASMiner may be used for programmatic access of DAS sources by programs in Matlab. Once the desired data is found during navigation, the query is exported in the format of an API call to be used within any Matlab application. We illustrate the use of DASMiner by creating integrative models of histone modification maps and protein-protein interaction networks. These enriched datasets were built by retrieving and integrating distributed genomic and proteomic DAS sources using the API. Conclusion The support of the DAS protocol allows that hundreds of molecular biology databases to be treated as a federated, online collection of resources. DASMiner enables full exploration of these resources, and can be used to deploy applications and create integrated views of biological systems using the information deposited in DAS repositories. PMID:19919683
Distributed Joint Source-Channel Coding in Wireless Sensor Networks
Zhu, Xuqi; Liu, Yu; Zhang, Lin
2009-01-01
Considering the fact that sensors are energy-limited and the wireless channel conditions in wireless sensor networks, there is an urgent need for a low-complexity coding method with high compression ratio and noise-resisted features. This paper reviews the progress made in distributed joint source-channel coding which can address this issue. The main existing deployments, from the theory to practice, of distributed joint source-channel coding over the independent channels, the multiple access channels and the broadcast channels are introduced, respectively. To this end, we also present a practical scheme for compressing multiple correlated sources over the independent channels. The simulation results demonstrate the desired efficiency. PMID:22408560
NASA Technical Reports Server (NTRS)
Clark, Kenneth; Watney, Garth; Murray, Alexander; Benowitz, Edward
2007-01-01
A computer program translates Unified Modeling Language (UML) representations of state charts into source code in the C, C++, and Python computing languages. ( State charts signifies graphical descriptions of states and state transitions of a spacecraft or other complex system.) The UML representations constituting the input to this program are generated by using a UML-compliant graphical design program to draw the state charts. The generated source code is consistent with the "quantum programming" approach, which is so named because it involves discrete states and state transitions that have features in common with states and state transitions in quantum mechanics. Quantum programming enables efficient implementation of state charts, suitable for real-time embedded flight software. In addition to source code, the autocoder program generates a graphical-user-interface (GUI) program that, in turn, generates a display of state transitions in response to events triggered by the user. The GUI program is wrapped around, and can be used to exercise the state-chart behavior of, the generated source code. Once the expected state-chart behavior is confirmed, the generated source code can be augmented with a software interface to the rest of the software with which the source code is required to interact.
Practices in source code sharing in astrophysics
NASA Astrophysics Data System (ADS)
Shamir, Lior; Wallin, John F.; Allen, Alice; Berriman, Bruce; Teuben, Peter; Nemiroff, Robert J.; Mink, Jessica; Hanisch, Robert J.; DuPrie, Kimberly
2013-02-01
While software and algorithms have become increasingly important in astronomy, the majority of authors who publish computational astronomy research do not share the source code they develop, making it difficult to replicate and reuse the work. In this paper we discuss the importance of sharing scientific source code with the entire astrophysics community, and propose that journals require authors to make their code publicly available when a paper is published. That is, we suggest that a paper that involves a computer program not be accepted for publication unless the source code becomes publicly available. The adoption of such a policy by editors, editorial boards, and reviewers will improve the ability to replicate scientific results, and will also make computational astronomy methods more available to other researchers who wish to apply them to their data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palmer, M.E.
1997-12-05
This V and V Report includes analysis of two revisions of the DMS [data management system] System Requirements Specification (SRS) and the Preliminary System Design Document (PSDD); the source code for the DMS Communication Module (DMSCOM) messages; the source code for selected DMS Screens, and the code for the BWAS Simulator. BDM Federal analysts used a series of matrices to: compare the requirements in the System Requirements Specification (SRS) to the specifications found in the System Design Document (SDD), to ensure the design supports the business functions, compare the discreet parts of the SDD with each other, to ensure thatmore » the design is consistent and cohesive, compare the source code of the DMS Communication Module with the specifications, to ensure that the resultant messages will support the design, compare the source code of selected screens to the specifications to ensure that resultant system screens will support the design, compare the source code of the BWAS simulator with the requirements to interface with DMS messages and data transfers relating to the BWAS operations.« less
Open-source tools for data mining.
Zupan, Blaz; Demsar, Janez
2008-03-01
With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
FunGene: the functional gene pipeline and repository.
Fish, Jordan A; Chai, Benli; Wang, Qiong; Sun, Yanni; Brown, C Titus; Tiedje, James M; Cole, James R
2013-01-01
Ribosomal RNA genes have become the standard molecular markers for microbial community analysis for good reasons, including universal occurrence in cellular organisms, availability of large databases, and ease of rRNA gene region amplification and analysis. As markers, however, rRNA genes have some significant limitations. The rRNA genes are often present in multiple copies, unlike most protein-coding genes. The slow rate of change in rRNA genes means that multiple species sometimes share identical 16S rRNA gene sequences, while many more species share identical sequences in the short 16S rRNA regions commonly analyzed. In addition, the genes involved in many important processes are not distributed in a phylogenetically coherent manner, potentially due to gene loss or horizontal gene transfer. While rRNA genes remain the most commonly used markers, key genes in ecologically important pathways, e.g., those involved in carbon and nitrogen cycling, can provide important insights into community composition and function not obtainable through rRNA analysis. However, working with ecofunctional gene data requires some tools beyond those required for rRNA analysis. To address this, our Functional Gene Pipeline and Repository (FunGene; http://fungene.cme.msu.edu/) offers databases of many common ecofunctional genes and proteins, as well as integrated tools that allow researchers to browse these collections and choose subsets for further analysis, build phylogenetic trees, test primers and probes for coverage, and download aligned sequences. Additional FunGene tools are specialized to process coding gene amplicon data. For example, FrameBot produces frameshift-corrected protein and DNA sequences from raw reads while finding the most closely related protein reference sequence. These tools can help provide better insight into microbial communities by directly studying key genes involved in important ecological processes.
WholeCellSimDB: a hybrid relational/HDF database for whole-cell model predictions.
Karr, Jonathan R; Phillips, Nolan C; Covert, Markus W
2014-01-01
Mechanistic 'whole-cell' models are needed to develop a complete understanding of cell physiology. However, extracting biological insights from whole-cell models requires running and analyzing large numbers of simulations. We developed WholeCellSimDB, a database for organizing whole-cell simulations. WholeCellSimDB was designed to enable researchers to search simulation metadata to identify simulations for further analysis, and quickly slice and aggregate simulation results data. In addition, WholeCellSimDB enables users to share simulations with the broader research community. The database uses a hybrid relational/hierarchical data format architecture to efficiently store and retrieve both simulation setup metadata and results data. WholeCellSimDB provides a graphical Web-based interface to search, browse, plot and export simulations; a JavaScript Object Notation (JSON) Web service to retrieve data for Web-based visualizations; a command-line interface to deposit simulations; and a Python API to retrieve data for advanced analysis. Overall, we believe WholeCellSimDB will help researchers use whole-cell models to advance basic biological science and bioengineering. http://www.wholecellsimdb.org SOURCE CODE REPOSITORY: URL: http://github.com/CovertLab/WholeCellSimDB. © The Author(s) 2014. Published by Oxford University Press.
Edmands, William M B; Barupal, Dinesh K; Scalbert, Augustin
2015-03-01
MetMSLine represents a complete collection of functions in the R programming language as an accessible GUI for biomarker discovery in large-scale liquid-chromatography high-resolution mass spectral datasets from acquisition through to final metabolite identification forming a backend to output from any peak-picking software such as XCMS. MetMSLine automatically creates subdirectories, data tables and relevant figures at the following steps: (i) signal smoothing, normalization, filtration and noise transformation (PreProc.QC.LSC.R); (ii) PCA and automatic outlier removal (Auto.PCA.R); (iii) automatic regression, biomarker selection, hierarchical clustering and cluster ion/artefact identification (Auto.MV.Regress.R); (iv) Biomarker-MS/MS fragmentation spectra matching and fragment/neutral loss annotation (Auto.MS.MS.match.R) and (v) semi-targeted metabolite identification based on a list of theoretical masses obtained from public databases (DBAnnotate.R). All source code and suggested parameters are available in an un-encapsulated layout on http://wmbedmands.github.io/MetMSLine/. Readme files and a synthetic dataset of both X-variables (simulated LC-MS data), Y-variables (simulated continuous variables) and metabolite theoretical masses are also available on our GitHub repository. © The Author 2014. Published by Oxford University Press.
Edmands, William M. B.; Barupal, Dinesh K.; Scalbert, Augustin
2015-01-01
Summary: MetMSLine represents a complete collection of functions in the R programming language as an accessible GUI for biomarker discovery in large-scale liquid-chromatography high-resolution mass spectral datasets from acquisition through to final metabolite identification forming a backend to output from any peak-picking software such as XCMS. MetMSLine automatically creates subdirectories, data tables and relevant figures at the following steps: (i) signal smoothing, normalization, filtration and noise transformation (PreProc.QC.LSC.R); (ii) PCA and automatic outlier removal (Auto.PCA.R); (iii) automatic regression, biomarker selection, hierarchical clustering and cluster ion/artefact identification (Auto.MV.Regress.R); (iv) Biomarker—MS/MS fragmentation spectra matching and fragment/neutral loss annotation (Auto.MS.MS.match.R) and (v) semi-targeted metabolite identification based on a list of theoretical masses obtained from public databases (DBAnnotate.R). Availability and implementation: All source code and suggested parameters are available in an un-encapsulated layout on http://wmbedmands.github.io/MetMSLine/. Readme files and a synthetic dataset of both X-variables (simulated LC–MS data), Y-variables (simulated continuous variables) and metabolite theoretical masses are also available on our GitHub repository. Contact: ScalbertA@iarc.fr PMID:25348215
UltraPse: A Universal and Extensible Software Platform for Representing Biological Sequences.
Du, Pu-Feng; Zhao, Wei; Miao, Yang-Yang; Wei, Le-Yi; Wang, Likun
2017-11-14
With the avalanche of biological sequences in public databases, one of the most challenging problems in computational biology is to predict their biological functions and cellular attributes. Most of the existing prediction algorithms can only handle fixed-length numerical vectors. Therefore, it is important to be able to represent biological sequences with various lengths using fixed-length numerical vectors. Although several algorithms, as well as software implementations, have been developed to address this problem, these existing programs can only provide a fixed number of representation modes. Every time a new sequence representation mode is developed, a new program will be needed. In this paper, we propose the UltraPse as a universal software platform for this problem. The function of the UltraPse is not only to generate various existing sequence representation modes, but also to simplify all future programming works in developing novel representation modes. The extensibility of UltraPse is particularly enhanced. It allows the users to define their own representation mode, their own physicochemical properties, or even their own types of biological sequences. Moreover, UltraPse is also the fastest software of its kind. The source code package, as well as the executables for both Linux and Windows platforms, can be downloaded from the GitHub repository.
Palacios-Flores, Kim; García-Sotelo, Jair; Castillo, Alejandra; Uribe, Carina; Aguilar, Luis; Morales, Lucía; Gómez-Romero, Laura; Reyes, José; Garciarubio, Alejandro; Boege, Margareta; Dávila, Guillermo
2018-01-01
We present a conceptually simple, sensitive, precise, and essentially nonstatistical solution for the analysis of genome variation in haploid organisms. The generation of a Perfect Match Genomic Landscape (PMGL), which computes intergenome identity with single nucleotide resolution, reveals signatures of variation wherever a query genome differs from a reference genome. Such signatures encode the precise location of different types of variants, including single nucleotide variants, deletions, insertions, and amplifications, effectively introducing the concept of a general signature of variation. The precise nature of variants is then resolved through the generation of targeted alignments between specific sets of sequence reads and known regions of the reference genome. Thus, the perfect match logic decouples the identification of the location of variants from the characterization of their nature, providing a unified framework for the detection of genome variation. We assessed the performance of the PMGL strategy via simulation experiments. We determined the variation profiles of natural genomes and of a synthetic chromosome, both in the context of haploid yeast strains. Our approach uncovered variants that have previously escaped detection. Moreover, our strategy is ideally suited for further refining high-quality reference genomes. The source codes for the automated PMGL pipeline have been deposited in a public repository. PMID:29367403
MONTRA: An agile architecture for data publishing and discovery.
Bastião Silva, Luís; Trifan, Alina; Luís Oliveira, José
2018-07-01
Data catalogues are a common form of capturing and presenting information about a specific kind of entity (e.g. products, services, professionals, datasets, etc.). However, the construction of a web-based catalogue for a particular scenario normally implies the development of a specific and dedicated solution. In this paper, we present MONTRA, a rapid-application development framework designed to facilitate the integration and discovery of heterogeneous objects, which may be characterized by distinct data structures. MONTRA was developed following a plugin-based architecture to allow dynamic composition of services over represented datasets. The core of MONTRA's functionalities resides in a flexible data skeleton used to characterize data entities, and from which a fully-fledged web data catalogue is automatically generated, ensuring access control and data privacy. MONTRA is being successfully used by several European projects to collect and manage biomedical databases. In this paper, we describe three of these applications scenarios. This work was motivated by the plethora of geographically scattered biomedical repositories, and by the role they can play altogether for the understanding of diseases and of the real-world effectiveness of treatments. Using metadata to expose datasets' characteristics, MONTRA greatly simplifies the task of building data catalogues. The source code is publicly available at https://github.com/bioinformatics-ua/montra. Copyright © 2018 Elsevier B.V. All rights reserved.
Cerqueira, Gustavo C; Arnaud, Martha B; Inglis, Diane O; Skrzypek, Marek S; Binkley, Gail; Simison, Matt; Miyasato, Stuart R; Binkley, Jonathan; Orvis, Joshua; Shah, Prachi; Wymore, Farrell; Sherlock, Gavin; Wortman, Jennifer R
2014-01-01
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a freely available web-based resource that was designed for Aspergillus researchers and is also a valuable source of information for the entire fungal research community. In addition to being a repository and central point of access to genome, transcriptome and polymorphism data, AspGD hosts a comprehensive comparative genomics toolbox that facilitates the exploration of precomputed orthologs among the 20 currently available Aspergillus genomes. AspGD curators perform gene product annotation based on review of the literature for four key Aspergillus species: Aspergillus nidulans, Aspergillus oryzae, Aspergillus fumigatus and Aspergillus niger. We have iteratively improved the structural annotation of Aspergillus genomes through the analysis of publicly available transcription data, mostly expressed sequenced tags, as described in a previous NAR Database article (Arnaud et al. 2012). In this update, we report substantive structural annotation improvements for A. nidulans, A. oryzae and A. fumigatus genomes based on recently available RNA-Seq data. Over 26 000 loci were updated across these species; although those primarily comprise the addition and extension of untranslated regions (UTRs), the new analysis also enabled over 1000 modifications affecting the coding sequence of genes in each target genome.
xiSPEC: web-based visualization, analysis and sharing of proteomics data.
Kolbowski, Lars; Combe, Colin; Rappsilber, Juri
2018-05-08
We present xiSPEC, a standard compliant, next-generation web-based spectrum viewer for visualizing, analyzing and sharing mass spectrometry data. Peptide-spectrum matches from standard proteomics and cross-linking experiments are supported. xiSPEC is to date the only browser-based tool supporting the standardized file formats mzML and mzIdentML defined by the proteomics standards initiative. Users can either upload data directly or select files from the PRIDE data repository as input. xiSPEC allows users to save and share their datasets publicly or password protected for providing access to collaborators or readers and reviewers of manuscripts. The identification table features advanced interaction controls and spectra are presented in three interconnected views: (i) annotated mass spectrum, (ii) peptide sequence fragmentation key and (iii) quality control error plots of matched fragments. Highlighting or selecting data points in any view is represented in all other views. Views are interactive scalable vector graphic elements, which can be exported, e.g. for use in publication. xiSPEC allows for re-annotation of spectra for easy hypothesis testing by modifying input data. xiSPEC is freely accessible at http://spectrumviewer.org and the source code is openly available on https://github.com/Rappsilber-Laboratory/xiSPEC.
NASA Astrophysics Data System (ADS)
Tudose, Alexandru; Terstyansky, Gabor; Kacsuk, Peter; Winter, Stephen
Grid Application Repositories vary greatly in terms of access interface, security system, implementation technology, communication protocols and repository model. This diversity has become a significant limitation in terms of interoperability and inter-repository access. This paper presents the Grid Application Meta-Repository System (GAMRS) as a solution that offers better options for the management of Grid applications. GAMRS proposes a generic repository architecture, which allows any Grid Application Repository (GAR) to be connected to the system independent of their underlying technology. It also presents applications in a uniform manner and makes applications from all connected repositories visible to web search engines, OGSI/WSRF Grid Services and other OAI (Open Archive Initiative)-compliant repositories. GAMRS can also function as a repository in its own right and can store applications under a new repository model. With the help of this model, applications can be presented as embedded in virtual machines (VM) and therefore they can be run in their native environments and can easily be deployed on virtualized infrastructures allowing interoperability with new generation technologies such as cloud computing, application-on-demand, automatic service/application deployments and automatic VM generation.
Xu, Guoai; Li, Qi; Guo, Yanhui; Zhang, Miao
2017-01-01
Authorship attribution is to identify the most likely author of a given sample among a set of candidate known authors. It can be not only applied to discover the original author of plain text, such as novels, blogs, emails, posts etc., but also used to identify source code programmers. Authorship attribution of source code is required in diverse applications, ranging from malicious code tracking to solving authorship dispute or software plagiarism detection. This paper aims to propose a new method to identify the programmer of Java source code samples with a higher accuracy. To this end, it first introduces back propagation (BP) neural network based on particle swarm optimization (PSO) into authorship attribution of source code. It begins by computing a set of defined feature metrics, including lexical and layout metrics, structure and syntax metrics, totally 19 dimensions. Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm. The effectiveness of the proposed method is evaluated on a collected dataset with 3,022 Java files belong to 40 authors. Experiment results show that the proposed method achieves 91.060% accuracy. And a comparison with previous work on authorship attribution of source code for Java language illustrates that this proposed method outperforms others overall, also with an acceptable overhead. PMID:29095934
The mathematical theory of signal processing and compression-designs
NASA Astrophysics Data System (ADS)
Feria, Erlan H.
2006-05-01
The mathematical theory of signal processing, named processor coding, will be shown to inherently arise as the computational time dual of Shannon's mathematical theory of communication which is also known as source coding. Source coding is concerned with signal source memory space compression while processor coding deals with signal processor computational time compression. Their combination is named compression-designs and referred as Conde in short. A compelling and pedagogically appealing diagram will be discussed highlighting Conde's remarkable successful application to real-world knowledge-aided (KA) airborne moving target indicator (AMTI) radar.
Accurate Modeling of Ionospheric Electromagnetic Fields Generated by a Low-Altitude VLF Transmitter
2007-08-31
latitude) for 3 different grid spacings. 14 8. Low-altitude fields produced by a 10-kHz source computed using the FD and TD codes. The agreement is...excellent, validating the new FD code. 16 9. High-altitude fields produced by a 10-kHz source computed using the FD and TD codes. The agreement is...again excellent. 17 10. Low-altitude fields produced by a 20-k.Hz source computed using the FD and TD codes. 17 11. High-altitude fields produced
48 CFR 227.7207 - Contractor data repositories.
Code of Federal Regulations, 2012 CFR
2012-10-01
... Computer Software and Computer Software Documentation 227.7207 Contractor data repositories. Follow 227.7108 when it is in the Government's interests to have a data repository include computer software or to have a separate computer software repository. Contractual instruments establishing the repository...
48 CFR 227.7207 - Contractor data repositories.
Code of Federal Regulations, 2011 CFR
2011-10-01
... Computer Software and Computer Software Documentation 227.7207 Contractor data repositories. Follow 227.7108 when it is in the Government's interests to have a data repository include computer software or to have a separate computer software repository. Contractual instruments establishing the repository...
48 CFR 227.7207 - Contractor data repositories.
Code of Federal Regulations, 2014 CFR
2014-10-01
... Computer Software and Computer Software Documentation 227.7207 Contractor data repositories. Follow 227.7108 when it is in the Government's interests to have a data repository include computer software or to have a separate computer software repository. Contractual instruments establishing the repository...
48 CFR 227.7207 - Contractor data repositories.
Code of Federal Regulations, 2013 CFR
2013-10-01
... Computer Software and Computer Software Documentation 227.7207 Contractor data repositories. Follow 227.7108 when it is in the Government's interests to have a data repository include computer software or to have a separate computer software repository. Contractual instruments establishing the repository...
Xu, Daoquan; Wang, Yinghui; Zhang, Ruijie; Guo, Jing; Zhang, Wei; Yu, Kefu
2016-05-01
The distribution and speciation of several heavy metals, i.e., As, Cd, Cr, Cu, Hg, Pb, and Zn, in surface sediments from the karst aquatic environment of the Lijiang River, Southwest China, were studied comparatively. The mean contents of Cd, Cu, Hg, Pb, and Zn were 1.72, 38.07, 0.18, 51.54, and 142.16 mg/kg, respectively, which were about 1.5-6 times higher than their corresponding regional sediment background values. Metal speciation obtained by the optimized BCR protocol highlighted the bioavailable threats of Cd, Cu, and Zn, which were highly associated with the exchangeable fraction (the labile phase). Hierarchical cluster analysis indicated that in sediments, As and Cr were mainly derived from natural and industrial sources, whereas fertilizer application might lead to the elevated level of Cd. Besides, Cu, Hg, Pb, and Zn were related to traffic activities. The effects-based sediment quality guidelines (SQGs) showed that Hg, Pb, and Zn could pose occasional adverse effects on sediment-dwelling organisms. However, based on the potential ecological risk assessment (PER) and risk assessment code (RAC), Cd was the most outstanding pollutant and posed the highest ecological hazard and bioavailable risk among the selected metals. Moreover, the metal partitioning between water and sediments was quantified through the calculation of the pseudo-partitioning coefficient (K P), and result implied that the sediments in this karst aquatic environment cannot be used as stable repositories for the metal pollutants.
NASA Astrophysics Data System (ADS)
McAllister, M.; Gochis, D.; Dugger, A. L.; Karsten, L. R.; McCreight, J. L.; Pan, L.; Rafieeinasab, A.; Read, L. K.; Sampson, K. M.; Yu, W.
2017-12-01
The community WRF-Hydro modeling system is publicly available and provides researchers and operational forecasters a flexible and extensible capability for performing multi-scale, multi-physics options for hydrologic modeling that can be run independent or fully-interactive with the WRF atmospheric model. The core WRF-Hydro physics model contains very high-resolution descriptions of terrestrial hydrologic process representations such as land-atmosphere exchanges of energy and moisture, snowpack evolution, infiltration, terrain routing, channel routing, basic reservoir representation and hydrologic data assimilation. Complementing the core physics components of WRF-Hydro are an ecosystem of pre- and post-processing tools that facilitate the preparation of terrain and meteorological input data, an open-source hydrologic model evaluation toolset (Rwrfhydro), hydrologic data assimilation capabilities with DART and advanced model visualization capabilities. The National Center for Atmospheric Research (NCAR), through collaborative support from the National Science Foundation and other funding partners, provides community support for the entire WRF-Hydro system through a variety of mechanisms. This presentation summarizes the enhanced user support capabilities that are being developed for the community WRF-Hydro modeling system. These products and services include a new website, open-source code repositories, documentation and user guides, test cases, online training materials, live, hands-on training sessions, an email list serve, and individual user support via email through a new help desk ticketing system. The WRF-Hydro modeling system and supporting tools which now include re-gridding scripts and model calibration have recently been updated to Version 4 and are merging toward capabilities of the National Water Model.
iAnn: an event sharing platform for the life sciences.
Jimenez, Rafael C; Albar, Juan P; Bhak, Jong; Blatter, Marie-Claude; Blicher, Thomas; Brazas, Michelle D; Brooksbank, Cath; Budd, Aidan; De Las Rivas, Javier; Dreyer, Jacqueline; van Driel, Marc A; Dunn, Michael J; Fernandes, Pedro L; van Gelder, Celia W G; Hermjakob, Henning; Ioannidis, Vassilios; Judge, David P; Kahlem, Pascal; Korpelainen, Eija; Kraus, Hans-Joachim; Loveland, Jane; Mayer, Christine; McDowall, Jennifer; Moran, Federico; Mulder, Nicola; Nyronen, Tommi; Rother, Kristian; Salazar, Gustavo A; Schneider, Reinhard; Via, Allegra; Villaveces, Jose M; Yu, Ping; Schneider, Maria V; Attwood, Teresa K; Corpas, Manuel
2013-08-01
We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. http://iann.pro/iannviewer manuel.corpas@tgac.ac.uk.
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION ON THE REPOSITORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marcus Milling
2001-10-01
The NGDRS has attained 72% of its targeted goal for cores and cuttings transfers, with over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. Additionally, large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale of the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghousemore » is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remained actively involved in assisting the National Research Council with background materials and presentations for their panel convened to study the data preservation issue. A final report of the panel is expected in early 2002. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION OF THE REPOSITORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marcus Milling
2003-04-01
The NGDRS has facilitated 85% of cores, cuttings, and other data identified available for transfer to the public sector. Over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. To date, with industry contributions for program operations and data transfers, the NGDRS project has realized a 6.5 to 1 return on investment to Department of Energy funds. Large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale ofmore » the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghouse is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remains actively involved in working to realize the vision of the National Research Council's report of geoscience data preservation. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
NATIONAL GEOSCIENCE DATA REPOSITORY SYSTEM PHASE III: IMPLEMENTATION AND OPERATION OF THE REPOSITORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marcus Milling
2002-10-01
The NGDRS has facilitated 85% of cores, cuttings, and other data identified available for transfer to the public sector. Over 12 million linear feet of cores and cuttings, in addition to large numbers of paleontological samples and are now available for public use. To date, with industry contributions for program operations and data transfers, the NGDRS project has realized a 6.5 to 1 return on investment to Department of Energy funds. Large-scale transfers of seismic data have been evaluated, but based on the recommendation of the NGDRS steering committee, cores have been given priority because of the vast scale ofmore » the seismic data problem relative to the available funding. The rapidly changing industry conditions have required that the primary core and cuttings preservation strategy evolve as well. Additionally, the NGDRS clearinghouse is evaluating the viability of transferring seismic data covering the western shelf of the Florida Gulf Coast. AGI remains actively involved in working to realize the vision of the National Research Council's report of geoscience data preservation. GeoTrek has been ported to Linux and MySQL, ensuring a purely open-source version of the software. This effort is key in ensuring long-term viability of the software so that is can continue basic operation regardless of specific funding levels. Work has commenced on a major revision of GeoTrek, using the open-source MapServer project and its related MapScript language. This effort will address a number of key technology issues that appear to be rising for 2002, including the discontinuation of the use of Java in future Microsoft operating systems. Discussions have been held regarding establishing potential new public data repositories, with hope for final determination in 2002.« less
Sáez, Carlos; Zurriaga, Oscar; Pérez-Panadés, Jordi; Melchor, Inma; Robles, Montserrat; García-Gómez, Juan M
2016-11-01
To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ). Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data. The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices. Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed. Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures. © The Author 2016. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
DOE Office of Scientific and Technical Information (OSTI.GOV)
J. Bauman; S. Burian; M. Deo
The Utah Heavy Oil Program (UHOP) was established in June 2006 to provide multidisciplinary research support to federal and state constituents for addressing the wide-ranging issues surrounding the creation of an industry for unconventional oil production in the United States. Additionally, UHOP was to serve as an on-going source of unbiased information to the nation surrounding technical, economic, legal and environmental aspects of developing heavy oil, oil sands, and oil shale resources. UHOP fulGilled its role by completing three tasks. First, in response to the Energy Policy Act of 2005 Section 369(p), UHOP published an update report to the 1987more » technical and economic assessment of domestic heavy oil resources that was prepared by the Interstate Oil and Gas Compact Commission. The UHOP report, entitled 'A Technical, Economic, and Legal Assessment of North American Heavy Oil, Oil Sands, and Oil Shale Resources' was published in electronic and hard copy form in October 2007. Second, UHOP developed of a comprehensive, publicly accessible online repository of unconventional oil resources in North America based on the DSpace software platform. An interactive map was also developed as a source of geospatial information and as a means to interact with the repository from a geospatial setting. All documents uploaded to the repository are fully searchable by author, title, and keywords. Third, UHOP sponsored Give research projects related to unconventional fuels development. Two projects looked at issues associated with oil shale production, including oil shale pyrolysis kinetics, resource heterogeneity, and reservoir simulation. One project evaluated in situ production from Utah oil sands. Another project focused on water availability and produced water treatments. The last project considered commercial oil shale leasing from a policy, environmental, and economic perspective.« less
Traits and types of health data repositories.
Wade, Ted D
2014-01-01
We review traits of reusable clinical data and offer a typology of clinical repositories with a range of known examples. Sources of clinical data suitable for research can be classified into types reflecting the data's institutional origin, original purpose, level of integration and governance. Primary data nearly always come from research studies and electronic medical records. Registries collect data on focused populations primarily to track outcomes, often using observational research methods. Warehouses are institutional information utilities repackaging clinical care data. Collections organize data from more organizations than a data warehouse, and more original data sources than a registry. Therefore even if they are heavily curated, their level of internal integration, and thus ease of use, can be less than other types. Federations are like collections except that physical control over data is distributed among donor organizations. Federations sometimes federate, giving a second level of organization. While the size, in number of patients, varies widely within each type of data source, populations over 10 K are relatively numerous, and much larger populations can be seen in warehouses and federations. One imagined ideal structure for research progress has been called an "Information Commons". It would have longitudinal, multi-leveled (environmental through molecular) data on a large population of identified, consenting individuals. These are qualities whose achievement would require long term commitment on the part of many data donors, including a willingness to make their data public.
Analysis of CERN computing infrastructure and monitoring data
NASA Astrophysics Data System (ADS)
Nieke, C.; Lassnig, M.; Menichetti, L.; Motesnitsalis, E.; Duellmann, D.
2015-12-01
Optimizing a computing infrastructure on the scale of LHC requires a quantitative understanding of a complex network of many different resources and services. For this purpose the CERN IT department and the LHC experiments are collecting a large multitude of logs and performance probes, which are already successfully used for short-term analysis (e.g. operational dashboards) within each group. The IT analytics working group has been created with the goal to bring data sources from different services and on different abstraction levels together and to implement a suitable infrastructure for mid- to long-term statistical analysis. It further provides a forum for joint optimization across single service boundaries and the exchange of analysis methods and tools. To simplify access to the collected data, we implemented an automated repository for cleaned and aggregated data sources based on the Hadoop ecosystem. This contribution describes some of the challenges encountered, such as dealing with heterogeneous data formats, selecting an efficient storage format for map reduce and external access, and will describe the repository user interface. Using this infrastructure we were able to quantitatively analyze the relationship between CPU/wall fraction, latency/throughput constraints of network and disk and the effective job throughput. In this contribution we will first describe the design of the shared analysis infrastructure and then present a summary of first analysis results from the combined data sources.
Providing Multi-Page Data Extraction Services with XWRAPComposer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Ling; Zhang, Jianjun; Han, Wei
2008-04-30
Dynamic Web data sources – sometimes known collectively as the Deep Web – increase the utility of the Web by providing intuitive access to data repositories anywhere that Web access is available. Deep Web services provide access to real-time information, like entertainment event listings, or present a Web interface to large databases or other data repositories. Recent studies suggest that the size and growth rate of the dynamic Web greatly exceed that of the static Web, yet dynamic content is often ignored by existing search engine indexers owing to the technical challenges that arise when attempting to search the Deepmore » Web. To address these challenges, we present DYNABOT, a service-centric crawler for discovering and clustering Deep Web sources offering dynamic content. DYNABOT has three unique characteristics. First, DYNABOT utilizes a service class model of the Web implemented through the construction of service class descriptions (SCDs). Second, DYNABOT employs a modular, self-tuning system architecture for focused crawling of the Deep Web using service class descriptions. Third, DYNABOT incorporates methods and algorithms for efficient probing of the Deep Web and for discovering and clustering Deep Web sources and services through SCD-based service matching analysis. Our experimental results demonstrate the effectiveness of the service class discovery, probing, and matching algorithms and suggest techniques for efficiently managing service discovery in the face of the immense scale of the Deep Web.« less
NASA Astrophysics Data System (ADS)
Wünderlich, D.; Mochalskyy, S.; Montellano, I. M.; Revel, A.
2018-05-01
Particle-in-cell (PIC) codes are used since the early 1960s for calculating self-consistently the motion of charged particles in plasmas, taking into account external electric and magnetic fields as well as the fields created by the particles itself. Due to the used very small time steps (in the order of the inverse plasma frequency) and mesh size, the computational requirements can be very high and they drastically increase with increasing plasma density and size of the calculation domain. Thus, usually small computational domains and/or reduced dimensionality are used. In the last years, the available central processing unit (CPU) power strongly increased. Together with a massive parallelization of the codes, it is now possible to describe in 3D the extraction of charged particles from a plasma, using calculation domains with an edge length of several centimeters, consisting of one extraction aperture, the plasma in direct vicinity of the aperture, and a part of the extraction system. Large negative hydrogen or deuterium ion sources are essential parts of the neutral beam injection (NBI) system in future fusion devices like the international fusion experiment ITER and the demonstration reactor (DEMO). For ITER NBI RF driven sources with a source area of 0.9 × 1.9 m2 and 1280 extraction apertures will be used. The extraction of negative ions is accompanied by the co-extraction of electrons which are deflected onto an electron dump. Typically, the maximum negative extracted ion current is limited by the amount and the temporal instability of the co-extracted electrons, especially for operation in deuterium. Different PIC codes are available for the extraction region of large driven negative ion sources for fusion. Additionally, some effort is ongoing in developing codes that describe in a simplified manner (coarser mesh or reduced dimensionality) the plasma of the whole ion source. The presentation first gives a brief overview of the current status of the ion source development for ITER NBI and of the PIC method. Different PIC codes for the extraction region are introduced as well as the coupling to codes describing the whole source (PIC codes or fluid codes). Presented and discussed are different physical and numerical aspects of applying PIC codes to negative hydrogen ion sources for fusion as well as selected code results. The main focus of future calculations will be the meniscus formation and identifying measures for reducing the co-extracted electrons, in particular for deuterium operation. The recent results of the 3D PIC code ONIX (calculation domain: one extraction aperture and its vicinity) for the ITER prototype source (1/8 size of the ITER NBI source) are presented.
40 CFR 124.33 - Information repository.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 40 Protection of Environment 21 2010-07-01 2010-07-01 false Information repository. 124.33 Section... FOR DECISIONMAKING Specific Procedures Applicable to RCRA Permits § 124.33 Information repository. (a... basis, for an information repository. When assessing the need for an information repository, the...
10 CFR 60.130 - General considerations.
Code of Federal Regulations, 2010 CFR
2010-01-01
... REPOSITORIES Technical Criteria Design Criteria for the Geologic Repository Operations Area § 60.130 General... for a high-level radioactive waste repository at a geologic repository operations area, and an... geologic repository operations area, must include the principal design criteria for a proposed facility...
Raising orphans from a metadata morass: A researcher's guide to re-use of public 'omics data.
Bhandary, Priyanka; Seetharam, Arun S; Arendsee, Zebulun W; Hur, Manhoi; Wurtele, Eve Syrkin
2018-02-01
More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive research can be accelerated by meta-analysis with a study of orphan (species-specific) genes. Computational predictions are critical to infer orphan function because their coding sequences provide very few clues. The metadata in public databases is often confusing; a test case with Zea mays mRNA seq data reveals a high proportion of missing, misleading or incomplete metadata. This metadata morass significantly diminishes the insight that can be extracted from these data. We provide tips for data submitters and users, including specific recommendations to improve metadata quality by more use of controlled vocabulary and by metadata reviews. Finally, we advocate for a unified, straightforward metadata submission and retrieval system. Copyright © 2017 Elsevier B.V. All rights reserved.
Peer-review Platform for Astronomy Education Activities
NASA Astrophysics Data System (ADS)
Heenatigala, Thilina; Russo, Pedro; Gomez, Edward; Strubbe, Linda
2015-08-01
Astronomy educators and teachers worldwide commonly request and search for high-quality astronomy activities to do with their students. Hundreds of astronomy education activities exist, as well as many resource repositories to find them. However, the quality of such resources is highly variable as they are not updated regularly or limited with content review. Since its launch in 2013, astroEDU has been addressing these issues and more by following a peer-review process. Each activity submitted is reviewed by an educator and a professional astronomer, balancing both the scientific and educational value of the content. Moreover, the majority of the reviewers are invited from IAU commissions related to the field of the activity, as an effort to get IAU members actively involved in the project. The website code, activities and layout design are open-access in order to make them accessible and adoptable for educators around the world. Furthermore the platform harnesses the OAD volunteer database to develop existing astronomy education activities into the astroEDU activity format. Published activities are also pushed to partner repositories and each activity is registered for DOI, allowing authors to cite their work. To further test the activities and improve the platform, astroEDU editorial team organises workshops.
Poskas, Povilas; Grigaliuniene, Dalia; Narkuniene, Asta; Kilda, Raimondas; Justinavicius, Darius
2016-11-01
There are two RBMK-1500 type graphite moderated reactors at the Ignalina nuclear power plant in Lithuania, and they are under decommissioning now. The graphite cannot be disposed of in a near surface repository, because of large amounts of (14)C. Therefore, disposal of the graphite in a geological repository is a reasonable solution. This study presents evaluation of the (14)C transfer by the groundwater pathway into the geosphere from the irradiated graphite in a generic geological repository in crystalline rocks and demonstration of the role of the different components of the engineered barrier system by performing local sensitivity analysis. The speciation of the released (14)C into organic and inorganic compounds as well as the most recent information on (14)C source term was taken into account. Two alternatives were considered in the analysis: disposal of graphite in containers with encapsulant and without it. It was evaluated that the maximal fractional flux of inorganic (14)C into the geosphere can vary from 10(-11)y(-1) (for non-encapsulated graphite) to 10(-12)y(-1) (for encapsulated graphite) while of organic (14)C it was about 10(-3)y(-1) of its inventory. Such difference demonstrates that investigations on the (14)C inventory and chemical form in which it is released are especially important. The parameter with the highest influence on the maximal flux into the geosphere for inorganic (14)C transfer was the sorption coefficient in the backfill and for organic (14)C transfer - the backfill hydraulic conductivity. Copyright © 2016 Elsevier B.V. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wessel, Silvia; Harvey, David
2013-06-28
The durability of PEM fuel cells is a primary requirement for large scale commercialization of these power systems in transportation and stationary market applications that target operational lifetimes of 5,000 hours and 40,000 hours by 2015, respectively. Key degradation modes contributing to fuel cell lifetime limitations have been largely associated with the platinum-based cathode catalyst layer. Furthermore, as fuel cells are driven to low cost materials and lower catalyst loadings in order to meet the cost targets for commercialization, the catalyst durability has become even more important. While over the past few years significant progress has been made in identifyingmore » the underlying causes of fuel cell degradation and key parameters that greatly influence the degradation rates, many gaps with respect to knowledge of the driving mechanisms still exist; in particular, the acceleration of the mechanisms due to different structural compositions and under different fuel cell conditions remains an area not well understood. The focus of this project was to address catalyst durability by using a dual path approach that coupled an extensive range of experimental analysis and testing with a multi-scale modeling approach. With this, the major technical areas/issues of catalyst and catalyst layer performance and durability that were addressed are: 1. Catalyst and catalyst layer degradation mechanisms (Pt dissolution, agglomeration, Pt loss, e.g. Pt in the membrane, carbon oxidation and/or corrosion). a. Driving force for the different degradation mechanisms. b. Relationships between MEA performance, catalyst and catalyst layer degradation and operational conditions, catalyst layer composition, and structure. 2. Materials properties a. Changes in catalyst, catalyst layer, and MEA materials properties due to degradation. 3. Catalyst performance a. Relationships between catalyst structural changes and performance. b. Stability of the three-phase boundary and its effect on performance/catalyst degradation. The key accomplishments of this project are: • The development of a molecular-dynamics based description of the carbon supported-Pt and ionomer system • The development of a composition-based, 1D-statistical Unit Cell Performance model • A modified and improved multi-pathway ORR model • An extension of the existing micro-structural catalyst model to transient operation • The coupling of a Pt Dissolution model to the modified ORR pathway model • The Development A Semi-empirical carbon corrosion model • The integration and release of an open-source forward predictive MEA performance and degradation model • Completion of correlations of BOT (beginning of test) and EOT (end of test) performance loss breakdown with cathode catalyst layer composition, morphology, material properties, and operational conditions • Catalyst layer durability windows and design curves • A design flow path of interactions from materials properties and catalyst layer effective properties to performance loss breakdown for virgin and degraded catalyst layers In order to ensure the best possible user experience we will perform a staged release of the software leading up to the webinar scheduled in October 2013. The release schedule will be as follows (please note that the manual will be released with the beta release as direct support is provided in Stage 1): • Stage 0 - Internal Ballard Release o Cross check of compilation and installation to ensure machine independence o Implement code on portable virtual machine to allow for non-UNIX use (pending) • Stage 1 - Alpha Release o The model code will be made available via a GIT, sourceforge, or other repository (under discussion at Ballard) for download and installation by a small pre-selected group of users o Users will be given three weeks to install, apply, and evaluate features of the code, providing feedback on issues or software bugs that require correction prior to beta release • Stage 2 - Beta Release o The model code repository is opened to the general public on a beta release concept, with a mechanism for bug tracking and feedback from a large user group o Code will be tracked and patched for any discovered bugs or relevant feedback from the user community, upon the completion of three months without a major bug submission the code will be moved to a full version release • Stage 3 - Full Version Release o Code is version to revision 1.0 and that version is frozen in development/patching« less
Wang, R; Li, X A
2001-02-01
The dose parameters for the beta-particle emitting 90Sr/90Y source for intravascular brachytherapy (IVBT) have been calculated by different investigators. At a distant distance from the source, noticeable differences are seen in these parameters calculated using different Monte Carlo codes. The purpose of this work is to quantify as well as to understand these differences. We have compared a series of calculations using an EGS4, an EGSnrc, and the MCNP Monte Carlo codes. Data calculated and compared include the depth dose curve for a broad parallel beam of electrons, and radial dose distributions for point electron sources (monoenergetic or polyenergetic) and for a real 90Sr/90Y source. For the 90Sr/90Y source, the doses at the reference position (2 mm radial distance) calculated by the three code agree within 2%. However, the differences between the dose calculated by the three codes can be over 20% in the radial distance range interested in IVBT. The difference increases with radial distance from source, and reaches 30% at the tail of dose curve. These differences may be partially attributed to the different multiple scattering theories and Monte Carlo models for electron transport adopted in these three codes. Doses calculated by the EGSnrc code are more accurate than those by the EGS4. The two calculations agree within 5% for radial distance <6 mm.
48 CFR 227.7108 - Contractor data repositories.
Code of Federal Regulations, 2010 CFR
2010-10-01
... Technical Data 227.7108 Contractor data repositories. (a) Contractor data repositories may be established... procedures for protecting technical data delivered to or stored at the repository from unauthorized release... disclosure of technical data from the repository to third parties consistent with the Government's rights in...
Kim, Daehee; Kim, Dongwan; An, Sunshin
2016-07-09
Code dissemination in wireless sensor networks (WSNs) is a procedure for distributing a new code image over the air in order to update programs. Due to the fact that WSNs are mostly deployed in unattended and hostile environments, secure code dissemination ensuring authenticity and integrity is essential. Recent works on dynamic packet size control in WSNs allow enhancing the energy efficiency of code dissemination by dynamically changing the packet size on the basis of link quality. However, the authentication tokens attached by the base station become useless in the next hop where the packet size can vary according to the link quality of the next hop. In this paper, we propose three source authentication schemes for code dissemination supporting dynamic packet size. Compared to traditional source authentication schemes such as μTESLA and digital signatures, our schemes provide secure source authentication under the environment, where the packet size changes in each hop, with smaller energy consumption.
Kim, Daehee; Kim, Dongwan; An, Sunshin
2016-01-01
Code dissemination in wireless sensor networks (WSNs) is a procedure for distributing a new code image over the air in order to update programs. Due to the fact that WSNs are mostly deployed in unattended and hostile environments, secure code dissemination ensuring authenticity and integrity is essential. Recent works on dynamic packet size control in WSNs allow enhancing the energy efficiency of code dissemination by dynamically changing the packet size on the basis of link quality. However, the authentication tokens attached by the base station become useless in the next hop where the packet size can vary according to the link quality of the next hop. In this paper, we propose three source authentication schemes for code dissemination supporting dynamic packet size. Compared to traditional source authentication schemes such as μTESLA and digital signatures, our schemes provide secure source authentication under the environment, where the packet size changes in each hop, with smaller energy consumption. PMID:27409616
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santos-Villalobos, Hector J; Gregor, Jens; Bingham, Philip R
2014-01-01
At the present, neutron sources cannot be fabricated small and powerful enough in order to achieve high resolution radiography while maintaining an adequate flux. One solution is to employ computational imaging techniques such as a Magnified Coded Source Imaging (CSI) system. A coded-mask is placed between the neutron source and the object. The system resolution is increased by reducing the size of the mask holes and the flux is increased by increasing the size of the coded-mask and/or the number of holes. One limitation of such system is that the resolution of current state-of-the-art scintillator-based detectors caps around 50um. Tomore » overcome this challenge, the coded-mask and object are magnified by making the distance from the coded-mask to the object much smaller than the distance from object to detector. In previous work, we have shown via synthetic experiments that our least squares method outperforms other methods in image quality and reconstruction precision because of the modeling of the CSI system components. However, the validation experiments were limited to simplistic neutron sources. In this work, we aim to model the flux distribution of a real neutron source and incorporate such a model in our least squares computational system. We provide a full description of the methodology used to characterize the neutron source and validate the method with synthetic experiments.« less
A Guide to Axial-Flow Turbine Off-Design Computer Program AXOD2
NASA Technical Reports Server (NTRS)
Chen, Shu-Cheng S.
2014-01-01
A Users Guide for the axial flow turbine off-design computer program AXOD2 is composed in this paper. This Users Guide is supplementary to the original Users Manual of AXOD. Three notable contributions of AXOD2 to its predecessor AXOD, both in the context of the Guide or in the functionality of the code, are described and discussed in length. These are: 1) a rational representation of the mathematical principles applied, with concise descriptions of the formulas implemented in the actual coding. Their physical implications are addressed; 2) the creation and documentation of an Addendum Listing of input namelist-parameters unique to AXOD2, that differ from or are in addition to the original input-namelists given in the Manual of AXOD. Their usages are discussed; and 3) the institution of proper stoppages of the code execution, encoding termination messaging and error messages of the execution to AXOD2. These measures are to safe-guard the integrity of the code execution, such that a failure mode encountered during a case-study would not plunge the code execution into indefinite loop, or cause a blow-out of the program execution. Details on these are discussed and illustrated in this paper. Moreover, this computer program has since been reconstructed substantially. Standard FORTRAN Langue was instituted, and the code was formatted in Double Precision (REAL*8). As the result, the code is now suited for use in a local Desktop Computer Environment, is perfectly portable to any Operating System, and can be executed by any FORTRAN compiler equivalent to a FORTRAN 9095 compiler. AXOD2 will be available through NASA Glenn Research Center (GRC) Software Repository.
Streamlined Genome Sequence Compression using Distributed Source Coding
Wang, Shuang; Jiang, Xiaoqian; Chen, Feng; Cui, Lijuan; Cheng, Samuel
2014-01-01
We aim at developing a streamlined genome sequence compression algorithm to support alternative miniaturized sequencing devices, which have limited communication, storage, and computation power. Existing techniques that require heavy client (encoder side) cannot be applied. To tackle this challenge, we carefully examined distributed source coding theory and developed a customized reference-based genome compression protocol to meet the low-complexity need at the client side. Based on the variation between source and reference, our protocol will pick adaptively either syndrome coding or hash coding to compress subsequences of changing code length. Our experimental results showed promising performance of the proposed method when compared with the state-of-the-art algorithm (GRS). PMID:25520552