open source databases: Topics by Science.gov

Sample records for open source databases

Comet: an open-source MS/MS sequence database search tool.

PubMed

Eng, Jimmy K; Jahan, Tahmina A; Hoopmann, Michael R

2013-01-01

Proteomics research routinely involves identifying peptides and proteins via MS/MS sequence database search. Thus the database search engine is an integral tool in many proteomics research groups. Here, we introduce the Comet search engine to the existing landscape of commercial and open-source database search tools. Comet is open source, freely available, and based on one of the original sequence database search tools that has been widely used for many years. © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Comprehensive Routing Security Development and Deployment for the Internet

DTIC Science & Technology

2015-02-01

feature enhancement and bug fixes. • MySQL : MySQL is a widely used and popular open source database package. It was chosen for database support in the...RPSTIR depends on several other open source packages. • MySQL : MySQL is used for the the local RPKI database cache. • OpenSSL: OpenSSL is used for...cryptographic libraries for X.509 certificates. • ODBC mySql Connector: ODBC (Open Database Connectivity) is a standard programming interface (API) for
The Longhorn Array Database (LAD): An Open-Source, MIAME compliant implementation of the Stanford Microarray Database (SMD)

PubMed Central

Killion, Patrick J; Sherlock, Gavin; Iyer, Vishwanath R

2003-01-01

Background The power of microarray analysis can be realized only if data is systematically archived and linked to biological annotations as well as analysis algorithms. Description The Longhorn Array Database (LAD) is a MIAME compliant microarray database that operates on PostgreSQL and Linux. It is a fully open source version of the Stanford Microarray Database (SMD), one of the largest microarray databases. LAD is available at Conclusions Our development of LAD provides a simple, free, open, reliable and proven solution for storage and analysis of two-color microarray data. PMID:12930545
MyMolDB: a micromolecular database solution with open source and free components.

PubMed

Xia, Bing; Tai, Zheng-Fu; Gu, Yu-Cheng; Li, Bang-Jing; Ding, Li-Sheng; Zhou, Yan

2011-10-01

To manage chemical structures in small laboratories is one of the important daily tasks. Few solutions are available on the internet, and most of them are closed source applications. The open-source applications typically have limited capability and basic cheminformatics functionalities. In this article, we describe an open-source solution to manage chemicals in research groups based on open source and free components. It has a user-friendly interface with the functions of chemical handling and intensive searching. MyMolDB is a micromolecular database solution that supports exact, substructure, similarity, and combined searching. This solution is mainly implemented using scripting language Python with a web-based interface for compound management and searching. Almost all the searches are in essence done with pure SQL on the database by using the high performance of the database engine. Thus, impressive searching speed has been archived in large data sets for no external Central Processing Unit (CPU) consuming languages were involved in the key procedure of the searching. MyMolDB is an open-source software and can be modified and/or redistributed under GNU General Public License version 3 published by the Free Software Foundation (Free Software Foundation Inc. The GNU General Public License, Version 3, 2007. Available at: http://www.gnu.org/licenses/gpl.html). The software itself can be found at http://code.google.com/p/mymoldb/. Copyright © 2011 Wiley Periodicals, Inc.
Preliminary geologic map of the Piru 7.5' quadrangle, southern California: a digital database

USGS Publications Warehouse

Yerkes, R.F.; Campbell, Russell H.

1995-01-01

This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1995). More specific information about the units may be available in the original sources.
The open-source movement: an introduction for forestry professionals

Treesearch

Patrick Proctor; Paul C. Van Deusen; Linda S. Heath; Jeffrey H. Gove

2005-01-01

In recent years, the open-source movement has yielded a generous and powerful suite of software and utilities that rivals those developed by many commercial software companies. Open-source programs are available for many scientific needs: operating systems, databases, statistical analysis, Geographic Information System applications, and object-oriented programming....
A Web-based open-source database for the distribution of hyperspectral signatures

NASA Astrophysics Data System (ADS)

Ferwerda, J. G.; Jones, S. D.; Du, Pei-Jun

2006-10-01

With the coming of age of field spectroscopy as a non-destructive means to collect information on the physiology of vegetation, there is a need for storage of signatures, and, more importantly, their metadata. Without the proper organisation of metadata, the signatures itself become limited. In order to facilitate re-distribution of data, a database for the storage & distribution of hyperspectral signatures and their metadata was designed. The database was built using open-source software, and can be used by the hyperspectral community to share their data. Data is uploaded through a simple web-based interface. The database recognizes major file-formats by ASD, GER and International Spectronics. The database source code is available for download through the hyperspectral.info web domain, and we happily invite suggestion for additions & modification for the database to be submitted through the online forums on the same website.
Evaluation of an open source tool for indexing and searching enterprise radiology and pathology reports

NASA Astrophysics Data System (ADS)

Kim, Woojin; Boonn, William

2010-03-01

Data mining of existing radiology and pathology reports within an enterprise health system can be used for clinical decision support, research, education, as well as operational analyses. In our health system, the database of radiology and pathology reports exceeds 13 million entries combined. We are building a web-based tool to allow search and data analysis of these combined databases using freely available and open source tools. This presentation will compare performance of an open source full-text indexing tool to MySQL's full-text indexing and searching and describe implementation procedures to incorporate these capabilities into a radiology-pathology search engine.
Open source hardware and software platform for robotics and artificial intelligence applications

NASA Astrophysics Data System (ADS)

Liang, S. Ng; Tan, K. O.; Lai Clement, T. H.; Ng, S. K.; Mohammed, A. H. Ali; Mailah, Musa; Azhar Yussof, Wan; Hamedon, Zamzuri; Yussof, Zulkifli

2016-02-01

Recent developments in open source hardware and software platforms (Android, Arduino, Linux, OpenCV etc.) have enabled rapid development of previously expensive and sophisticated system within a lower budget and flatter learning curves for developers. Using these platform, we designed and developed a Java-based 3D robotic simulation system, with graph database, which is integrated in online and offline modes with an Android-Arduino based rubbish picking remote control car. The combination of the open source hardware and software system created a flexible and expandable platform for further developments in the future, both in the software and hardware areas, in particular in combination with graph database for artificial intelligence, as well as more sophisticated hardware, such as legged or humanoid robots.
Database Organisation in a Web-Enabled Free and Open-Source Software (foss) Environment for Spatio-Temporal Landslide Modelling

NASA Astrophysics Data System (ADS)

Das, I.; Oberai, K.; Sarathi Roy, P.

2012-07-01

Landslides exhibit themselves in different mass movement processes and are considered among the most complex natural hazards occurring on the earth surface. Making landslide database available online via WWW (World Wide Web) promotes the spreading and reaching out of the landslide information to all the stakeholders. The aim of this research is to present a comprehensive database for generating landslide hazard scenario with the help of available historic records of landslides and geo-environmental factors and make them available over the Web using geospatial Free & Open Source Software (FOSS). FOSS reduces the cost of the project drastically as proprietary software's are very costly. Landslide data generated for the period 1982 to 2009 were compiled along the national highway road corridor in Indian Himalayas. All the geo-environmental datasets along with the landslide susceptibility map were served through WEBGIS client interface. Open source University of Minnesota (UMN) mapserver was used as GIS server software for developing web enabled landslide geospatial database. PHP/Mapscript server-side application serve as a front-end application and PostgreSQL with PostGIS extension serve as a backend application for the web enabled landslide spatio-temporal databases. This dynamic virtual visualization process through a web platform brings an insight into the understanding of the landslides and the resulting damage closer to the affected people and user community. The landslide susceptibility dataset is also made available as an Open Geospatial Consortium (OGC) Web Feature Service (WFS) which can be accessed through any OGC compliant open source or proprietary GIS Software.
Datacube Services in Action, Using Open Source and Open Standards

NASA Astrophysics Data System (ADS)

Baumann, P.; Misev, D.

2016-12-01

Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.
Looking toward the Future: A Case Study of Open Source Software in the Humanities

ERIC Educational Resources Information Center

Quamen, Harvey

2006-01-01

In this article Harvey Quamen examines how the philosophy of open source software might be of particular benefit to humanities scholars in the near future--particularly for academic journals with limited financial resources. To this end he provides a case study in which he describes his use of open source technology (MySQL database software and…
"XANSONS for COD": a new small BOINC project in crystallography

NASA Astrophysics Data System (ADS)

Neverov, Vladislav S.; Khrapov, Nikolay P.

2018-04-01

"XANSONS for COD" (http://xansons4cod.com) is a new BOINC project aimed at creating the open-access database of simulated x-ray and neutron powder diffraction patterns for nanocrystalline phase of materials from the collection of the Crystallography Open Database (COD). The project uses original open-source software XaNSoNS to simulate diffraction patterns on CPU and GPU. This paper describes the scientific problem this project solves, the project's internal structure, its operation principles and organization of the final database.
The Role of Free/Libre and Open Source Software in Learning Health Systems.

PubMed

Paton, C; Karopka, T

2017-08-01

Objective: To give an overview of the role of Free/Libre and Open Source Software (FLOSS) in the context of secondary use of patient data to enable Learning Health Systems (LHSs). Methods: We conducted an environmental scan of the academic and grey literature utilising the MedFLOSS database of open source systems in healthcare to inform a discussion of the role of open source in developing LHSs that reuse patient data for research and quality improvement. Results: A wide range of FLOSS is identified that contributes to the information technology (IT) infrastructure of LHSs including operating systems, databases, frameworks, interoperability software, and mobile and web apps. The recent literature around the development and use of key clinical data management tools is also reviewed. Conclusions: FLOSS already plays a critical role in modern health IT infrastructure for the collection, storage, and analysis of patient data. The nature of FLOSS systems to be collaborative, modular, and modifiable may make open source approaches appropriate for building the digital infrastructure for a LHS. Georg Thieme Verlag KG Stuttgart.
Managing Digital Archives Using Open Source Software Tools

NASA Astrophysics Data System (ADS)

Barve, S.; Dongare, S.

2007-10-01

This paper describes the use of open source software tools such as MySQL and PHP for creating database-backed websites. Such websites offer many advantages over ones built from static HTML pages. This paper will discuss how OSS tools are used and their benefits, and after the successful implementation of these tools how the library took the initiative in implementing an institutional repository using DSpace open source software.
ZeBase: an open-source relational database for zebrafish laboratories.

PubMed

Hensley, Monica R; Hassenplug, Eric; McPhail, Rodney; Leung, Yuk Fai

2012-03-01

Abstract ZeBase is an open-source relational database for zebrafish inventory. It is designed for the recording of genetic, breeding, and survival information of fish lines maintained in a single- or multi-laboratory environment. Users can easily access ZeBase through standard web-browsers anywhere on a network. Convenient search and reporting functions are available to facilitate routine inventory work; such functions can also be automated by simple scripting. Optional barcode generation and scanning are also built-in for easy access to the information related to any fish. Further information of the database and an example implementation can be found at http://zebase.bio.purdue.edu.
KaBOB: ontology-based semantic integration of biomedical databases.

PubMed

Livingston, Kevin M; Bada, Michael; Baumgartner, William A; Hunter, Lawrence E

2015-04-23

The ability to query many independent biological databases using a common ontology-based semantic model would facilitate deeper integration and more effective utilization of these diverse and rapidly growing resources. Despite ongoing work moving toward shared data formats and linked identifiers, significant problems persist in semantic data integration in order to establish shared identity and shared meaning across heterogeneous biomedical data sources. We present five processes for semantic data integration that, when applied collectively, solve seven key problems. These processes include making explicit the differences between biomedical concepts and database records, aggregating sets of identifiers denoting the same biomedical concepts across data sources, and using declaratively represented forward-chaining rules to take information that is variably represented in source databases and integrating it into a consistent biomedical representation. We demonstrate these processes and solutions by presenting KaBOB (the Knowledge Base Of Biomedicine), a knowledge base of semantically integrated data from 18 prominent biomedical databases using common representations grounded in Open Biomedical Ontologies. An instance of KaBOB with data about humans and seven major model organisms can be built using on the order of 500 million RDF triples. All source code for building KaBOB is available under an open-source license. KaBOB is an integrated knowledge base of biomedical data representationally based in prominent, actively maintained Open Biomedical Ontologies, thus enabling queries of the underlying data in terms of biomedical concepts (e.g., genes and gene products, interactions and processes) rather than features of source-specific data schemas or file formats. KaBOB resolves many of the issues that routinely plague biomedical researchers intending to work with data from multiple data sources and provides a platform for ongoing data integration and development and for formal reasoning over a wealth of integrated biomedical data.
Integration of a clinical trial database with a PACS

NASA Astrophysics Data System (ADS)

van Herk, M.

2014-03-01

Many clinical trials use Electronic Case Report Forms (ECRF), e.g., from OpenClinica. Trial data is augmented if DICOM scans, dose cubes, etc. from the Picture Archiving and Communication System (PACS) are included for data mining. Unfortunately, there is as yet no structured way to collect DICOM objects in trial databases. In this paper, we obtain a tight integration of ECRF and PACS using open source software. Methods: DICOM identifiers for selected images/series/studies are stored in associated ECRF events (e.g., baseline) as follows: 1) JavaScript added to OpenClinica communicates using HTML with a gateway server inside the hospitals firewall; 2) On this gateway, an open source DICOM server runs scripts to query and select the data, returning anonymized identifiers; 3) The scripts then collects, anonymizes, zips and transmits selected data to a central trial server; 4) Here data is stored in a DICOM archive which allows authorized ECRF users to view and download the anonymous images associated with each event. Results: All integration scripts are open source. The PACS administrator configures the anonymization script and decides to use the gateway in passive (receiving) mode or in an active mode going out to the PACS to gather data. Our ECRF centric approach supports automatic data mining by iterating over the cases in the ECRF database, providing the identifiers to load images and the clinical data to correlate with image analysis results. Conclusions: Using open source software and web technology, a tight integration has been achieved between PACS and ECRF.
Open source database of images DEIMOS: extension for large-scale subjective image quality assessment

NASA Astrophysics Data System (ADS)

Vítek, Stanislav

2014-09-01

DEIMOS (Database of Images: Open Source) is an open-source database of images and video sequences for testing, verification and comparison of various image and/or video processing techniques such as compression, reconstruction and enhancement. This paper deals with extension of the database allowing performing large-scale web-based subjective image quality assessment. Extension implements both administrative and client interface. The proposed system is aimed mainly at mobile communication devices, taking into account advantages of HTML5 technology; it means that participants don't need to install any application and assessment could be performed using web browser. The assessment campaign administrator can select images from the large database and then apply rules defined by various test procedure recommendations. The standard test procedures may be fully customized and saved as a template. Alternatively the administrator can define a custom test, using images from the pool and other components, such as evaluating forms and ongoing questionnaires. Image sequence is delivered to the online client, e.g. smartphone or tablet, as a fully automated assessment sequence or viewer can decide on timing of the assessment if required. Environmental data and viewing conditions (e.g. illumination, vibrations, GPS coordinates, etc.), may be collected and subsequently analyzed.
Rapid development of entity-based data models for bioinformatics with persistence object-oriented design and structured interfaces.

PubMed

Ezra Tsur, Elishai

2017-01-01

Databases are imperative for research in bioinformatics and computational biology. Current challenges in database design include data heterogeneity and context-dependent interconnections between data entities. These challenges drove the development of unified data interfaces and specialized databases. The curation of specialized databases is an ever-growing challenge due to the introduction of new data sources and the emergence of new relational connections between established datasets. Here, an open-source framework for the curation of specialized databases is proposed. The framework supports user-designed models of data encapsulation, objects persistency and structured interfaces to local and external data sources such as MalaCards, Biomodels and the National Centre for Biotechnology Information (NCBI) databases. The proposed framework was implemented using Java as the development environment, EclipseLink as the data persistency agent and Apache Derby as the database manager. Syntactic analysis was based on J3D, jsoup, Apache Commons and w3c.dom open libraries. Finally, a construction of a specialized database for aneurysms associated vascular diseases is demonstrated. This database contains 3-dimensional geometries of aneurysms, patient's clinical information, articles, biological models, related diseases and our recently published model of aneurysms' risk of rapture. Framework is available in: http://nbel-lab.com.

Development of a data entry auditing protocol and quality assurance for a tissue bank database.

PubMed

Khushi, Matloob; Carpenter, Jane E; Balleine, Rosemary L; Clarke, Christine L

2012-03-01

Human transcription error is an acknowledged risk when extracting information from paper records for entry into a database. For a tissue bank, it is critical that accurate data are provided to researchers with approved access to tissue bank material. The challenges of tissue bank data collection include manual extraction of data from complex medical reports that are accessed from a number of sources and that differ in style and layout. As a quality assurance measure, the Breast Cancer Tissue Bank (http:\\\\www.abctb.org.au) has implemented an auditing protocol and in order to efficiently execute the process, has developed an open source database plug-in tool (eAuditor) to assist in auditing of data held in our tissue bank database. Using eAuditor, we have identified that human entry errors range from 0.01% when entering donor's clinical follow-up details, to 0.53% when entering pathological details, highlighting the importance of an audit protocol tool such as eAuditor in a tissue bank database. eAuditor was developed and tested on the Caisis open source clinical-research database; however, it can be integrated in other databases where similar functionality is required.
ACToR Chemical Structure processing using Open Source ChemInformatics Libraries (FutureToxII)

EPA Science Inventory

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from ove...
SQL is Dead; Long-live SQL: Relational Database Technology in Science Contexts

NASA Astrophysics Data System (ADS)

Howe, B.; Halperin, D.

2014-12-01

Relational databases are often perceived as a poor fit in science contexts: Rigid schemas, poor support for complex analytics, unpredictable performance, significant maintenance and tuning requirements --- these idiosyncrasies often make databases unattractive in science contexts characterized by heterogeneous data sources, complex analysis tasks, rapidly changing requirements, and limited IT budgets. In this talk, I'll argue that although the value proposition of typical relational database systems are weak in science, the core ideas that power relational databases have become incredibly prolific in open source science software, and are emerging as a universal abstraction for both big data and small data. In addition, I'll talk about two open source systems we are building to "jailbreak" the core technology of relational databases and adapt them for use in science. The first is SQLShare, a Database-as-a-Service system supporting collaborative data analysis and exchange by reducing database use to an Upload-Query-Share workflow with no installation, schema design, or configuration required. The second is Myria, a service that supports much larger scale data, complex analytics, and supports multiple back end systems. Finally, I'll describe some of the ways our collaborators in oceanography, astronomy, biology, fisheries science, and more are using these systems to replace script-based workflows for reasons of performance, flexibility, and convenience.
JBioWH: an open-source Java framework for bioinformatics data integration

PubMed Central

Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

2013-01-01

The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh PMID:23846595
JBioWH: an open-source Java framework for bioinformatics data integration.

PubMed

Vera, Roberto; Perez-Riverol, Yasset; Perez, Sonia; Ligeti, Balázs; Kertész-Farkas, Attila; Pongor, Sándor

2013-01-01

The Java BioWareHouse (JBioWH) project is an open-source platform-independent programming framework that allows a user to build his/her own integrated database from the most popular data sources. JBioWH can be used for intensive querying of multiple data sources and the creation of streamlined task-specific data sets on local PCs. JBioWH is based on a MySQL relational database scheme and includes JAVA API parser functions for retrieving data from 20 public databases (e.g. NCBI, KEGG, etc.). It also includes a client desktop application for (non-programmer) users to query data. In addition, JBioWH can be tailored for use in specific circumstances, including the handling of massive queries for high-throughput analyses or CPU intensive calculations. The framework is provided with complete documentation and application examples and it can be downloaded from the Project Web site at http://code.google.com/p/jbiowh. A MySQL server is available for demonstration purposes at hydrax.icgeb.trieste.it:3307. Database URL: http://code.google.com/p/jbiowh.
An open-source, mobile-friendly search engine for public medical knowledge.

PubMed

Samwald, Matthias; Hanbury, Allan

2014-01-01

The World Wide Web has become an important source of information for medical practitioners. To complement the capabilities of currently available web search engines we developed FindMeEvidence, an open-source, mobile-friendly medical search engine. In a preliminary evaluation, the quality of results from FindMeEvidence proved to be competitive with those from TRIP Database, an established, closed-source search engine for evidence-based medicine.
OrChem - An open source chemistry search engine for Oracle(R).

PubMed

Rijnbeek, Mark; Steinbeck, Christoph

2009-10-22

Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net.
DEVELOPMENT AND APPLICATION OF THE DORIAN (DOSE-RESPONSE INFORMATION ANALYSIS) SYSTEM

EPA Science Inventory
- Migration of ArrayTrack from the proprietary Oracle database to open source Postgres database.
- Making the public version of the ebKB available with provisions for soliciting input from collaborators and outside users.
- Continued development ...
- Virtualization of open-source secure web services to support data exchange in a pediatric critical care research network
  
  PubMed Central
  
  Sward, Katherine A; Newth, Christopher JL; Khemani, Robinder G; Cryer, Martin E; Thelen, Julie L; Enriquez, Rene; Shaoyu, Su; Pollack, Murray M; Harrison, Rick E; Meert, Kathleen L; Berg, Robert A; Wessel, David L; Shanley, Thomas P; Dalton, Heidi; Carcillo, Joseph; Jenkins, Tammara L; Dean, J Michael
  
  2015-01-01
  
  Objectives To examine the feasibility of deploying a virtual web service for sharing data within a research network, and to evaluate the impact on data consistency and quality. Material and Methods Virtual machines (VMs) encapsulated an open-source, semantically and syntactically interoperable secure web service infrastructure along with a shadow database. The VMs were deployed to 8 Collaborative Pediatric Critical Care Research Network Clinical Centers. Results Virtual web services could be deployed in hours. The interoperability of the web services reduced format misalignment from 56% to 1% and demonstrated that 99% of the data consistently transferred using the data dictionary and 1% needed human curation. Conclusions Use of virtualized open-source secure web service technology could enable direct electronic abstraction of data from hospital databases for research purposes. PMID:25796596
- OpenTrials: towards a collaborative open database of all available information on all clinical trials.
  
  PubMed
  
  Goldacre, Ben; Gray, Jonathan
  
  2016-04-08
  
  OpenTrials is a collaborative and open database for all available structured data and documents on all clinical trials, threaded together by individual trial. With a versatile and expandable data schema, it is initially designed to host and match the following documents and data for each trial: registry entries; links, abstracts, or texts of academic journal papers; portions of regulatory documents describing individual trials; structured data on methods and results extracted by systematic reviewers or other researchers; clinical study reports; and additional documents such as blank consent forms, blank case report forms, and protocols. The intention is to create an open, freely re-usable index of all such information and to increase discoverability, facilitate research, identify inconsistent data, enable audits on the availability and completeness of this information, support advocacy for better data and drive up standards around open data in evidence-based medicine. The project has phase I funding. This will allow us to create a practical data schema and populate the database initially through web-scraping, basic record linkage techniques, crowd-sourced curation around selected drug areas, and import of existing sources of structured and documents. It will also allow us to create user-friendly web interfaces onto the data and conduct user engagement workshops to optimise the database and interface designs. Where other projects have set out to manually and perfectly curate a narrow range of information on a smaller number of trials, we aim to use a broader range of techniques and attempt to match a very large quantity of information on all trials. We are currently seeking feedback and additional sources of structured data.
- Clinical records anonymisation and text extraction (CRATE): an open-source software system.
  
  PubMed
  
  Cardinal, Rudolf N
  
  2017-04-26
  
  Electronic medical records contain information of value for research, but contain identifiable and often highly sensitive confidential information. Patient-identifiable information cannot in general be shared outside clinical care teams without explicit consent, but anonymisation/de-identification allows research uses of clinical data without explicit consent. This article presents CRATE (Clinical Records Anonymisation and Text Extraction), an open-source software system with separable functions: (1) it anonymises or de-identifies arbitrary relational databases, with sensitivity and precision similar to previous comparable systems; (2) it uses public secure cryptographic methods to map patient identifiers to research identifiers (pseudonyms); (3) it connects relational databases to external tools for natural language processing; (4) it provides a web front end for research and administrative functions; and (5) it supports a specific model through which patients may consent to be contacted about research. Creation and management of a research database from sensitive clinical records with secure pseudonym generation, full-text indexing, and a consent-to-contact process is possible and practical using entirely free and open-source software.
- Analyzing Historical Primary Source Open Educational Resources: A Blended Pedagogical Approach
  
  ERIC Educational Resources Information Center
  
  Oliver, Kevin M.; Purichia, Heather R.
  
  2018-01-01
  
  This qualitative case study addresses the need for pedagogical approaches to working with open educational resources (OER). Drawing on a mix of historical thinking heuristics and case analysis approaches, a blended pedagogical strategy and primary source database were designed to build student understanding of historical records with transfer of…
- Ibmdbpy-spatial : An Open-source implementation of in-database geospatial analytics in Python
  
  NASA Astrophysics Data System (ADS)
  
  Roy, Avipsa; Fouché, Edouard; Rodriguez Morales, Rafael; Moehler, Gregor
  
  2017-04-01
  
  As the amount of spatial data acquired from several geodetic sources has grown over the years and as data infrastructure has become more powerful, the need for adoption of in-database analytic technology within geosciences has grown rapidly. In-database analytics on spatial data stored in a traditional enterprise data warehouse enables much faster retrieval and analysis for making better predictions about risks and opportunities, identifying trends and spot anomalies. Although there are a number of open-source spatial analysis libraries like geopandas and shapely available today, most of them have been restricted to manipulation and analysis of geometric objects with a dependency on GEOS and similar libraries. We present an open-source software package, written in Python, to fill the gap between spatial analysis and in-database analytics. Ibmdbpy-spatial provides a geospatial extension to the ibmdbpy package, implemented in 2015. It provides an interface for spatial data manipulation and access to in-database algorithms in IBM dashDB, a data warehouse platform with a spatial extender that runs as a service on IBM's cloud platform called Bluemix. Working in-database reduces the network overload, as the complete data need not be replicated into the user's local system altogether and only a subset of the entire dataset can be fetched into memory in a single instance. Ibmdbpy-spatial accelerates Python analytics by seamlessly pushing operations written in Python into the underlying database for execution using the dashDB spatial extender, thereby benefiting from in-database performance-enhancing features, such as columnar storage and parallel processing. The package is currently supported on Python versions from 2.7 up to 3.4. The basic architecture of the package consists of three main components - 1) a connection to the dashDB represented by the instance IdaDataBase, which uses a middleware API namely - pypyodbc or jaydebeapi to establish the database connection via ODBC or JDBC respectively, 2) an instance to represent the spatial data stored in the database as a dataframe in Python, called the IdaGeoDataFrame, with a specific geometry attribute which recognises a planar geometry column in dashDB and 3) Python wrappers for spatial functions like within, distance, area, buffer} and more which dashDB currently supports to make the querying process from Python much simpler for the users. The spatial functions translate well-known geopandas-like syntax into SQL queries utilising the database connection to perform spatial operations in-database and can operate on single geometries as well two different geometries from different IdaGeoDataFrames. The in-database queries strictly follow the standards of OpenGIS Implementation Specification for Geographic information - Simple feature access for SQL. The results of the operations obtained can thereby be accessed dynamically via interactive Jupyter notebooks from any system which supports Python, without any additional dependencies and can also be combined with other open source libraries such as matplotlib and folium in-built within Jupyter notebooks for visualization purposes. We built a use case to analyse crime hotspots in New York city to validate our implementation and visualized the results as a choropleth map for each borough.
- Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database.
  
  PubMed
  
  Zappia, Luke; Phipson, Belinda; Oshlack, Alicia
  
  2018-06-25
  
  As single-cell RNA-sequencing (scRNA-seq) datasets have become more widespread the number of tools designed to analyse these data has dramatically increased. Navigating the vast sea of tools now available is becoming increasingly challenging for researchers. In order to better facilitate selection of appropriate analysis tools we have created the scRNA-tools database (www.scRNA-tools.org) to catalogue and curate analysis tools as they become available. Our database collects a range of information on each scRNA-seq analysis tool and categorises them according to the analysis tasks they perform. Exploration of this database gives insights into the areas of rapid development of analysis methods for scRNA-seq data. We see that many tools perform tasks specific to scRNA-seq analysis, particularly clustering and ordering of cells. We also find that the scRNA-seq community embraces an open-source and open-science approach, with most tools available under open-source licenses and preprints being extensively used as a means to describe methods. The scRNA-tools database provides a valuable resource for researchers embarking on scRNA-seq analysis and records the growth of the field over time.
- LOVD: easy creation of a locus-specific sequence variation database using an "LSDB-in-a-box" approach.
  
  PubMed
  
  Fokkema, Ivo F A C; den Dunnen, Johan T; Taschner, Peter E M
  
  2005-08-01
  
  The completion of the human genome project has initiated, as well as provided the basis for, the collection and study of all sequence variation between individuals. Direct access to up-to-date information on sequence variation is currently provided most efficiently through web-based, gene-centered, locus-specific databases (LSDBs). We have developed the Leiden Open (source) Variation Database (LOVD) software approaching the "LSDB-in-a-Box" idea for the easy creation and maintenance of a fully web-based gene sequence variation database. LOVD is platform-independent and uses PHP and MySQL open source software only. The basic gene-centered and modular design of the database follows the recommendations of the Human Genome Variation Society (HGVS) and focuses on the collection and display of DNA sequence variations. With minimal effort, the LOVD platform is extendable with clinical data. The open set-up should both facilitate and promote functional extension with scripts written by the community. The LOVD software is freely available from the Leiden Muscular Dystrophy pages (www.DMD.nl/LOVD/). To promote the use of LOVD, we currently offer curators the possibility to set up an LSDB on our Leiden server. (c) 2005 Wiley-Liss, Inc.
- Modernizing the MagIC Paleomagnetic and Rock Magnetic Database Technology Stack to Encourage Code Reuse and Reproducible Science
  
  NASA Astrophysics Data System (ADS)
  
  Minnett, R.; Koppers, A. A. P.; Jarboe, N.; Jonestrask, L.; Tauxe, L.; Constable, C.
  
  2016-12-01
  
  The Magnetics Information Consortium (https://earthref.org/MagIC/) develops and maintains a database and web application for supporting the paleo-, geo-, and rock magnetic scientific community. Historically, this objective has been met with an Oracle database and a Perl web application at the San Diego Supercomputer Center (SDSC). The Oracle Enterprise Cluster at SDSC, however, was decommissioned in July of 2016 and the cost for MagIC to continue using Oracle became prohibitive. This provided MagIC with a unique opportunity to reexamine the entire technology stack and data model. MagIC has developed an open-source web application using the Meteor (http://meteor.com) framework and a MongoDB database. The simplicity of the open-source full-stack framework that Meteor provides has improved MagIC's development pace and the increased flexibility of the data schema in MongoDB encouraged the reorganization of the MagIC Data Model. As a result of incorporating actively developed open-source projects into the technology stack, MagIC has benefited from their vibrant software development communities. This has translated into a more modern web application that has significantly improved the user experience for the paleo-, geo-, and rock magnetic scientific community.
- DABAM: an open-source database of X-ray mirrors metrology
  
  DOE Office of Scientific and Technical Information (OSTI.GOV)
  
  Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele
  
  2016-04-20
  
  An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper,more » with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. Some optics simulations are presented and discussed to illustrate the real use of the profiles from the database.« less
- DABAM: an open-source database of X-ray mirrors metrology
  
  PubMed Central
  
  Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele; Glass, Mark; Idir, Mourad; Metz, Jim; Raimondi, Lorenzo; Rebuffi, Luca; Reininger, Ruben; Shi, Xianbo; Siewert, Frank; Spielmann-Jaeggi, Sibylle; Takacs, Peter; Tomasset, Muriel; Tonnessen, Tom; Vivo, Amparo; Yashchuk, Valeriy
  
  2016-01-01
  
  An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper, with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. Some optics simulations are presented and discussed to illustrate the real use of the profiles from the database. PMID:27140145
- DABAM: An open-source database of X-ray mirrors metrology
  
  DOE PAGES
  
  Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele; ...
  
  2016-05-01
  
  An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper,more » with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. In conclusion, some optics simulations are presented and discussed to illustrate the real use of the profiles from the database.« less
- DABAM: an open-source database of X-ray mirrors metrology
  
  DOE Office of Scientific and Technical Information (OSTI.GOV)
  
  Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele
  
  An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper,more » with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. Some optics simulations are presented and discussed to illustrate the real use of the profiles from the database.« less

DABAM: An open-source database of X-ray mirrors metrology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele

An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper,more » with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. In conclusion, some optics simulations are presented and discussed to illustrate the real use of the profiles from the database.« less
OrChem - An open source chemistry search engine for Oracle®

PubMed Central

2009-01-01

Background Registration, indexing and searching of chemical structures in relational databases is one of the core areas of cheminformatics. However, little detail has been published on the inner workings of search engines and their development has been mostly closed-source. We decided to develop an open source chemistry extension for Oracle, the de facto database platform in the commercial world. Results Here we present OrChem, an extension for the Oracle 11G database that adds registration and indexing of chemical structures to support fast substructure and similarity searching. The cheminformatics functionality is provided by the Chemistry Development Kit. OrChem provides similarity searching with response times in the order of seconds for databases with millions of compounds, depending on a given similarity cut-off. For substructure searching, it can make use of multiple processor cores on today's powerful database servers to provide fast response times in equally large data sets. Availability OrChem is free software and can be redistributed and/or modified under the terms of the GNU Lesser General Public License as published by the Free Software Foundation. All software is available via http://orchem.sourceforge.net. PMID:20298521
An open access database for the evaluation of heart sound algorithms.

PubMed

Liu, Chengyu; Springer, David; Li, Qiao; Moody, Benjamin; Juan, Ricardo Abad; Chorro, Francisco J; Castells, Francisco; Roig, José Millet; Silva, Ikaro; Johnson, Alistair E W; Syed, Zeeshan; Schmidt, Samuel E; Papadaniil, Chrysa D; Hadjileontiadis, Leontios; Naseri, Hosein; Moukadem, Ali; Dieterlen, Alain; Brandt, Christian; Tang, Hong; Samieinasab, Maryam; Samieinasab, Mohammad Reza; Sameni, Reza; Mark, Roger G; Clifford, Gari D

2016-12-01

In the past few decades, analysis of heart sound signals (i.e. the phonocardiogram or PCG), especially for automated heart sound segmentation and classification, has been widely studied and has been reported to have the potential value to detect pathology accurately in clinical applications. However, comparative analyses of algorithms in the literature have been hindered by the lack of high-quality, rigorously validated, and standardized open databases of heart sound recordings. This paper describes a public heart sound database, assembled for an international competition, the PhysioNet/Computing in Cardiology (CinC) Challenge 2016. The archive comprises nine different heart sound databases sourced from multiple research groups around the world. It includes 2435 heart sound recordings in total collected from 1297 healthy subjects and patients with a variety of conditions, including heart valve disease and coronary artery disease. The recordings were collected from a variety of clinical or nonclinical (such as in-home visits) environments and equipment. The length of recording varied from several seconds to several minutes. This article reports detailed information about the subjects/patients including demographics (number, age, gender), recordings (number, location, state and time length), associated synchronously recorded signals, sampling frequency and sensor type used. We also provide a brief summary of the commonly used heart sound segmentation and classification methods, including open source code provided concurrently for the Challenge. A description of the PhysioNet/CinC Challenge 2016, including the main aims, the training and test sets, the hand corrected annotations for different heart sound states, the scoring mechanism, and associated open source code are provided. In addition, several potential benefits from the public heart sound database are discussed.
The successes and challenges of open-source biopharmaceutical innovation.

PubMed

Allarakhia, Minna

2014-05-01

Increasingly, open-source-based alliances seek to provide broad access to data, research-based tools, preclinical samples and downstream compounds. The challenge is how to create value from open-source biopharmaceutical innovation. This value creation may occur via transparency and usage of data across the biopharmaceutical value chain as stakeholders move dynamically between open source and open innovation. In this article, several examples are used to trace the evolution of biopharmaceutical open-source initiatives. The article specifically discusses the technological challenges associated with the integration and standardization of big data; the human capacity development challenges associated with skill development around big data usage; and the data-material access challenge associated with data and material access and usage rights, particularly as the boundary between open source and open innovation becomes more fluid. It is the author's opinion that the assessment of when and how value creation will occur, through open-source biopharmaceutical innovation, is paramount. The key is to determine the metrics of value creation and the necessary technological, educational and legal frameworks to support the downstream outcomes of now big data-based open-source initiatives. The continued focus on the early-stage value creation is not advisable. Instead, it would be more advisable to adopt an approach where stakeholders transform open-source initiatives into open-source discovery, crowdsourcing and open product development partnerships on the same platform.
Implementing a Dynamic Database-Driven Course Using LAMP

ERIC Educational Resources Information Center

Laverty, Joseph Packy; Wood, David; Turchek, John

2011-01-01

This paper documents the formulation of a database driven open source architecture web development course. The design of a web-based curriculum faces many challenges: a) relative emphasis of client and server-side technologies, b) choice of a server-side language, and c) the cost and efficient delivery of a dynamic web development, database-driven…
myChEMBL: a virtual machine implementation of open data and cheminformatics tools.

PubMed

Ochoa, Rodrigo; Davies, Mark; Papadatos, George; Atkinson, Francis; Overington, John P

2014-01-15

myChEMBL is a completely open platform, which combines public domain bioactivity data with open source database and cheminformatics technologies. myChEMBL consists of a Linux (Ubuntu) Virtual Machine featuring a PostgreSQL schema with the latest version of the ChEMBL database, as well as the latest RDKit cheminformatics libraries. In addition, a self-contained web interface is available, which can be modified and improved according to user specifications. The VM is available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/VM/myChEMBL/current. The web interface and web services code is available at: https://github.com/rochoa85/myChEMBL.
Health Information-Seeking Patterns of the General Public and Indications for Disease Surveillance: Register-Based Study Using Lyme Disease.

PubMed

Pesälä, Samuli; Virtanen, Mikko J; Sane, Jussi; Mustonen, Pekka; Kaila, Minna; Helve, Otto

2017-11-06

People using the Internet to find information on health issues, such as specific diseases, usually start their search from a general search engine, for example, Google. Internet searches such as these may yield results and data of questionable quality and reliability. Health Library is a free-of-charge medical portal on the Internet providing medical information for the general public. Physician's Databases, an Internet evidence-based medicine source, provides medical information for health care professionals (HCPs) to support their clinical practice. Both databases are available throughout Finland, but the latter is used only by health professionals and pharmacies. Little is known about how the general public seeks medical information from medical sources on the Internet, how this behavior differs from HCPs' queries, and what causes possible differences in behavior. The aim of our study was to evaluate how the general public's and HCPs' information-seeking trends from Internet medical databases differ seasonally and temporally. In addition, we aimed to evaluate whether the general public's information-seeking trends could be utilized for disease surveillance and whether media coverage could affect these seeking trends. Lyme disease, serving as a well-defined disease model with distinct seasonal variation, was chosen as a case study. Two Internet medical databases, Health Library and Physician's Databases, were used. We compared the general public's article openings on Lyme disease from Health Library to HCPs' article openings on Lyme disease from Physician's Databases seasonally across Finland from 2011 to 2015. Additionally, media publications related to Lyme disease were searched from the largest and most popular media websites in Finland. Both databases, Health Library and Physician's Databases, show visually similar patterns in temporal variations of article openings on Lyme disease in Finland from 2011 to 2015. However, Health Library openings show not only an increasing trend over time but also greater fluctuations, especially during peak opening seasons. Outside these seasons, publications in the media coincide with Health Library article openings only occasionally. Lyme disease-related information-seeking behaviors between the general public and HCPs from Internet medical portals share similar temporal variations, which is consistent with the trend seen in epidemiological data. Therefore, the general public's article openings could be used as a supplementary source of information for disease surveillance. The fluctuations in article openings appeared stronger among the general public, thus, suggesting that different factors such as media coverage, affect the information-seeking behaviors of the public versus professionals. However, media coverage may also have an influence on HCPs. Not every publication was associated with an increase in openings, but the higher the media coverage by some publications, the higher the general public's access to Health Library. ©Samuli Pesälä, Mikko J Virtanen, Jussi Sane, Pekka Mustonen, Minna Kaila, Otto Helve. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 06.11.2017.
Health Information–Seeking Patterns of the General Public and Indications for Disease Surveillance: Register-Based Study Using Lyme Disease

PubMed Central

Virtanen, Mikko J; Sane, Jussi; Mustonen, Pekka; Kaila, Minna; Helve, Otto

2017-01-01

Background People using the Internet to find information on health issues, such as specific diseases, usually start their search from a general search engine, for example, Google. Internet searches such as these may yield results and data of questionable quality and reliability. Health Library is a free-of-charge medical portal on the Internet providing medical information for the general public. Physician’s Databases, an Internet evidence-based medicine source, provides medical information for health care professionals (HCPs) to support their clinical practice. Both databases are available throughout Finland, but the latter is used only by health professionals and pharmacies. Little is known about how the general public seeks medical information from medical sources on the Internet, how this behavior differs from HCPs’ queries, and what causes possible differences in behavior. Objective The aim of our study was to evaluate how the general public’s and HCPs’ information-seeking trends from Internet medical databases differ seasonally and temporally. In addition, we aimed to evaluate whether the general public’s information-seeking trends could be utilized for disease surveillance and whether media coverage could affect these seeking trends. Methods Lyme disease, serving as a well-defined disease model with distinct seasonal variation, was chosen as a case study. Two Internet medical databases, Health Library and Physician’s Databases, were used. We compared the general public’s article openings on Lyme disease from Health Library to HCPs’ article openings on Lyme disease from Physician’s Databases seasonally across Finland from 2011 to 2015. Additionally, media publications related to Lyme disease were searched from the largest and most popular media websites in Finland. Results Both databases, Health Library and Physician’s Databases, show visually similar patterns in temporal variations of article openings on Lyme disease in Finland from 2011 to 2015. However, Health Library openings show not only an increasing trend over time but also greater fluctuations, especially during peak opening seasons. Outside these seasons, publications in the media coincide with Health Library article openings only occasionally. Conclusions Lyme disease–related information-seeking behaviors between the general public and HCPs from Internet medical portals share similar temporal variations, which is consistent with the trend seen in epidemiological data. Therefore, the general public’s article openings could be used as a supplementary source of information for disease surveillance. The fluctuations in article openings appeared stronger among the general public, thus, suggesting that different factors such as media coverage, affect the information-seeking behaviors of the public versus professionals. However, media coverage may also have an influence on HCPs. Not every publication was associated with an increase in openings, but the higher the media coverage by some publications, the higher the general public’s access to Health Library. PMID:29109071
Generation of a Database of Laboratory Laser-Induced Breakdown Spectroscopy (LIBS) Spectra and Associated Analysis Software

NASA Astrophysics Data System (ADS)

Anderson, R. B.; Clegg, S. M.; Graff, T.; Morris, R. V.; Laura, J.

2015-06-01

We describe plans to generate a database of LIBS spectra of planetary analog materials and develop free, open-source software to enable the planetary community to analyze LIBS (and other spectral) data.
Cloud-Based Distributed Control of Unmanned Systems

DTIC Science & Technology

2015-04-01

during mission execution. At best, the data is saved onto hard-drives and is accessible only by the local team. Data history in a form available and...following open source technologies: GeoServer, OpenLayers, PostgreSQL , and PostGIS are chosen to implement the back-end database and server. A brief...geospatial map data. 3. PostgreSQL : An SQL-compliant object-relational database that easily scales to accommodate large amounts of data - upwards to
ACToR Chemical Structure processing using Open Source ...

EPA Pesticide Factsheets

ACToR (Aggregated Computational Toxicology Resource) is a centralized database repository developed by the National Center for Computational Toxicology (NCCT) at the U.S. Environmental Protection Agency (EPA). Free and open source tools were used to compile toxicity data from over 1,950 public sources. ACToR contains chemical structure information and toxicological data for over 558,000 unique chemicals. The database primarily includes data from NCCT research programs, in vivo toxicity data from ToxRef, human exposure data from ExpoCast, high-throughput screening data from ToxCast and high quality chemical structure information from the EPA DSSTox program. The DSSTox database is a chemical structure inventory for the NCCT programs and currently has about 16,000 unique structures. Included are also data from PubChem, ChemSpider, USDA, FDA, NIH and several other public data sources. ACToR has been a resource to various international and national research groups. Most of our recent efforts on ACToR are focused on improving the structural identifiers and Physico-Chemical properties of the chemicals in the database. Organizing this huge collection of data and improving the chemical structure quality of the database has posed some major challenges. Workflows have been developed to process structures, calculate chemical properties and identify relationships between CAS numbers. The Structure processing workflow integrates web services (PubChem and NIH NCI Cactus) to d
The IntAct molecular interaction database in 2012

PubMed Central

Kerrien, Samuel; Aranda, Bruno; Breuza, Lionel; Bridge, Alan; Broackes-Carter, Fiona; Chen, Carol; Duesbury, Margaret; Dumousseau, Marine; Feuermann, Marc; Hinz, Ursula; Jandrasits, Christine; Jimenez, Rafael C.; Khadake, Jyoti; Mahadevan, Usha; Masson, Patrick; Pedruzzi, Ivo; Pfeiffenberger, Eric; Porras, Pablo; Raghunath, Arathi; Roechert, Bernd; Orchard, Sandra; Hermjakob, Henning

2012-01-01

IntAct is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. Two levels of curation are now available within the database, with both IMEx-level annotation and less detailed MIMIx-compatible entries currently supported. As from September 2011, IntAct contains approximately 275 000 curated binary interaction evidences from over 5000 publications. The IntAct website has been improved to enhance the search process and in particular the graphical display of the results. New data download formats are also available, which will facilitate the inclusion of IntAct's data in the Semantic Web. IntAct is an active contributor to the IMEx consortium (http://www.imexconsortium.org). IntAct source code and data are freely available at http://www.ebi.ac.uk/intact. PMID:22121220
Data Shaping in the Cultural Simulation Modeler Integrated Behavioral Assessment Capability. Phase I

DTIC Science & Technology

2007-07-01

articles that appeared in global media in the years 1999-2006. The articles were all open source information and were obtained in part through an...agreement between Factiva Dow Jones and the NRL for this project, and in part collected by IndaSea from the Open Source Center database and a variety of...This view implied that a system geared to assist analysts should be open and completely dynamic. It is IndaSea’s perspective that there are advantages
77 FR 21808 - Privacy Act of 1974; System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2012-04-11

... and open source records and commercial database. EXEMPTIONS CLAIMED FOR THE SYSTEM: The Attorney... notification procedures, the record access procedures, the contesting record procedures, the record source..., confidential sources, and victims of crimes. The offenses and alleged offenses associated with the individuals...
Common characteristics of open source software development and applicability for drug discovery: a systematic review.

PubMed

Ardal, Christine; Alstadsæter, Annette; Røttingen, John-Arne

2011-09-28

Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents.
Virtualization of open-source secure web services to support data exchange in a pediatric critical care research network.

PubMed

Frey, Lewis J; Sward, Katherine A; Newth, Christopher J L; Khemani, Robinder G; Cryer, Martin E; Thelen, Julie L; Enriquez, Rene; Shaoyu, Su; Pollack, Murray M; Harrison, Rick E; Meert, Kathleen L; Berg, Robert A; Wessel, David L; Shanley, Thomas P; Dalton, Heidi; Carcillo, Joseph; Jenkins, Tammara L; Dean, J Michael

2015-11-01

To examine the feasibility of deploying a virtual web service for sharing data within a research network, and to evaluate the impact on data consistency and quality. Virtual machines (VMs) encapsulated an open-source, semantically and syntactically interoperable secure web service infrastructure along with a shadow database. The VMs were deployed to 8 Collaborative Pediatric Critical Care Research Network Clinical Centers. Virtual web services could be deployed in hours. The interoperability of the web services reduced format misalignment from 56% to 1% and demonstrated that 99% of the data consistently transferred using the data dictionary and 1% needed human curation. Use of virtualized open-source secure web service technology could enable direct electronic abstraction of data from hospital databases for research purposes. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Utilization of open source electronic health record around the world: A systematic review.

PubMed

Aminpour, Farzaneh; Sadoughi, Farahnaz; Ahamdi, Maryam

2014-01-01

Many projects on developing Electronic Health Record (EHR) systems have been carried out in many countries. The current study was conducted to review the published data on the utilization of open source EHR systems in different countries all over the world. Using free text and keyword search techniques, six bibliographic databases were searched for related articles. The identified papers were screened and reviewed during a string of stages for the irrelevancy and validity. The findings showed that open source EHRs have been wildly used by source limited regions in all continents, especially in Sub-Saharan Africa and South America. It would create opportunities to improve national healthcare level especially in developing countries with minimal financial resources. Open source technology is a solution to overcome the problems of high-costs and inflexibility associated with the proprietary health information systems.
Database Entity Persistence with Hibernate for the Network Connectivity Analysis Model

DTIC Science & Technology

2014-04-01

time savings in the Java coding development process. Appendices A and B describe address setup procedures for installing the MySQL database...development environment is required: • The open source MySQL Database Management System (DBMS) from Oracle, which is a Java Database Connectivity (JDBC...compliant DBMS • MySQL JDBC Driver library that comes as a plug-in with the Netbeans distribution • The latest Java Development Kit with the latest
Development of an open source laboratory information management system for 2-D gel electrophoresis-based proteomics workflow

PubMed Central

Morisawa, Hiraku; Hirota, Mikako; Toda, Tosifusa

2006-01-01

Background In the post-genome era, most research scientists working in the field of proteomics are confronted with difficulties in management of large volumes of data, which they are required to keep in formats suitable for subsequent data mining. Therefore, a well-developed open source laboratory information management system (LIMS) should be available for their proteomics research studies. Results We developed an open source LIMS appropriately customized for 2-D gel electrophoresis-based proteomics workflow. The main features of its design are compactness, flexibility and connectivity to public databases. It supports the handling of data imported from mass spectrometry software and 2-D gel image analysis software. The LIMS is equipped with the same input interface for 2-D gel information as a clickable map on public 2DPAGE databases. The LIMS allows researchers to follow their own experimental procedures by reviewing the illustrations of 2-D gel maps and well layouts on the digestion plates and MS sample plates. Conclusion Our new open source LIMS is now available as a basic model for proteome informatics, and is accessible for further improvement. We hope that many research scientists working in the field of proteomics will evaluate our LIMS and suggest ways in which it can be improved. PMID:17018156
Database Search Engines: Paradigms, Challenges and Solutions.

PubMed

Verheggen, Kenneth; Martens, Lennart; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

2016-01-01

The first step in identifying proteins from mass spectrometry based shotgun proteomics data is to infer peptides from tandem mass spectra, a task generally achieved using database search engines. In this chapter, the basic principles of database search engines are introduced with a focus on open source software, and the use of database search engines is demonstrated using the freely available SearchGUI interface. This chapter also discusses how to tackle general issues related to sequence database searching and shows how to minimize their impact.

InChI in the wild: an assessment of InChIKey searching in Google

PubMed Central

2013-01-01

While chemical databases can be queried using the InChI string and InChIKey (IK) the latter was designed for open-web searching. It is becoming increasingly effective for this since more sources enhance crawling of their websites by the Googlebot and consequent IK indexing. Searchers who use Google as an adjunct to database access may be less familiar with the advantages of using the IK as explored in this review. As an example, the IK for atorvastatin retrieves ~200 low-redundancy links from a Google search in 0.3 of a second. These include most major databases and a very low false-positive rate. Results encompass less familiar but potentially useful sources and can be extended to isomer capture by using just the skeleton layer of the IK. Google Advanced Search can be used to filter large result sets. Image searching with the IK is also effective and complementary to open-web queries. Results can be particularly useful for less-common structures as exemplified by a major metabolite of atorvastatin giving only three hits. Testing also demonstrated document-to-document and document-to-database joins via structure matching. The necessary generation of an IK from chemical names can be accomplished using open tools and resources for patents, papers, abstracts or other text sources. Active global sharing of local IK-linked information can be accomplished via surfacing in open laboratory notebooks, blogs, Twitter, figshare and other routes. While information-rich chemistry (e.g. approved drugs) can exhibit swamping and redundancy effects, the much smaller IK result sets for link-poor structures become a transformative first-pass option. The IK indexing has therefore turned Google into a de-facto open global chemical information hub by merging links to most significant sources, including over 50 million PubChem and ChemSpider records. The simplicity, specificity and speed of matching make it a useful option for biologists or others less familiar with chemical searching. However, compared to rigorously maintained major databases, users need to be circumspect about the consistency of Google results and provenance of retrieved links. In addition, community engagement may be necessary to ameliorate possible future degradation of utility. PMID:23399051
Common characteristics of open source software development and applicability for drug discovery: a systematic review

PubMed Central

2011-01-01

Background Innovation through an open source model has proven to be successful for software development. This success has led many to speculate if open source can be applied to other industries with similar success. We attempt to provide an understanding of open source software development characteristics for researchers, business leaders and government officials who may be interested in utilizing open source innovation in other contexts and with an emphasis on drug discovery. Methods A systematic review was performed by searching relevant, multidisciplinary databases to extract empirical research regarding the common characteristics and barriers of initiating and maintaining an open source software development project. Results Common characteristics to open source software development pertinent to open source drug discovery were extracted. The characteristics were then grouped into the areas of participant attraction, management of volunteers, control mechanisms, legal framework and physical constraints. Lastly, their applicability to drug discovery was examined. Conclusions We believe that the open source model is viable for drug discovery, although it is unlikely that it will exactly follow the form used in software development. Hybrids will likely develop that suit the unique characteristics of drug discovery. We suggest potential motivations for organizations to join an open source drug discovery project. We also examine specific differences between software and medicines, specifically how the need for laboratories and physical goods will impact the model as well as the effect of patents. PMID:21955914
Open-source tools for data mining.

PubMed

Zupan, Blaz; Demsar, Janez

2008-03-01

With a growing volume of biomedical databases and repositories, the need to develop a set of tools to address their analysis and support knowledge discovery is becoming acute. The data mining community has developed a substantial set of techniques for computational treatment of these data. In this article, we discuss the evolution of open-source toolboxes that data mining researchers and enthusiasts have developed over the span of a few decades and review several currently available open-source data mining suites. The approaches we review are diverse in data mining methods and user interfaces and also demonstrate that the field and its tools are ready to be fully exploited in biomedical research.
Database of historically documented springs and spring flow measurements in Texas

USGS Publications Warehouse

Heitmuller, Franklin T.; Reece, Brian D.

2003-01-01

Springs are naturally occurring features that convey excess ground water to the land surface; they represent a transition from ground water to surface water. Water issues through one opening, multiple openings, or numerous seeps in the rock or soil. The database of this report provides information about springs and spring flow in Texas including spring names, identification numbers, location, and, if available, water source and use. This database does not include every spring in Texas, but is limited to an aggregation of selected digital and hard-copy data of the U.S. Geological Survey (USGS), the Texas Water Development Board (TWDB), and Capitol Environmental Services.
Utilization of open source electronic health record around the world: A systematic review

PubMed Central

Aminpour, Farzaneh; Sadoughi, Farahnaz; Ahamdi, Maryam

2014-01-01

Many projects on developing Electronic Health Record (EHR) systems have been carried out in many countries. The current study was conducted to review the published data on the utilization of open source EHR systems in different countries all over the world. Using free text and keyword search techniques, six bibliographic databases were searched for related articles. The identified papers were screened and reviewed during a string of stages for the irrelevancy and validity. The findings showed that open source EHRs have been wildly used by source limited regions in all continents, especially in Sub-Saharan Africa and South America. It would create opportunities to improve national healthcare level especially in developing countries with minimal financial resources. Open source technology is a solution to overcome the problems of high-costs and inflexibility associated with the proprietary health information systems. PMID:24672566
Implementation of an open adoption research data management system for clinical studies.

PubMed

Müller, Jan; Heiss, Kirsten Ingmar; Oberhoffer, Renate

2017-07-06

Research institutions need to manage multiple studies with individual data sets, processing rules and different permissions. So far, there is no standard technology that provides an easy to use environment to create databases and user interfaces for clinical trials or research studies. Therefore various software solutions are being used-from custom software, explicitly designed for a specific study, to cost intensive commercial Clinical Trial Management Systems (CTMS) up to very basic approaches with self-designed Microsoft ® databases. The technology applied to conduct those studies varies tremendously from study to study, making it difficult to evaluate data across various studies (meta-analysis) and keeping a defined level of quality in database design, data processing, displaying and exporting. Furthermore, the systems being used to collect study data are often operated redundantly to systems used in patient care. As a consequence the data collection in studies is inefficient and data quality may suffer from unsynchronized datasets, non-normalized database scenarios and manually executed data transfers. With OpenCampus Research we implemented an open adoption software (OAS) solution on an open source basis, which provides a standard environment for state-of-the-art research database management at low cost.
Phynx: an open source software solution supporting data management and web-based patient-level data review for drug safety studies in the general practice research database and other health care databases.

PubMed

Egbring, Marco; Kullak-Ublick, Gerd A; Russmann, Stefan

2010-01-01

To develop a software solution that supports management and clinical review of patient data from electronic medical records databases or claims databases for pharmacoepidemiological drug safety studies. We used open source software to build a data management system and an internet application with a Flex client on a Java application server with a MySQL database backend. The application is hosted on Amazon Elastic Compute Cloud. This solution named Phynx supports data management, Web-based display of electronic patient information, and interactive review of patient-level information in the individual clinical context. This system was applied to a dataset from the UK General Practice Research Database (GPRD). Our solution can be setup and customized with limited programming resources, and there is almost no extra cost for software. Access times are short, the displayed information is structured in chronological order and visually attractive, and selected information such as drug exposure can be blinded. External experts can review patient profiles and save evaluations and comments via a common Web browser. Phynx provides a flexible and economical solution for patient-level review of electronic medical information from databases considering the individual clinical context. It can therefore make an important contribution to an efficient validation of outcome assessment in drug safety database studies.
Your Personal Analysis Toolkit - An Open Source Solution

NASA Astrophysics Data System (ADS)

Mitchell, T.

2009-12-01

Open source software is commonly known for its web browsers, word processors and programming languages. However, there is a vast array of open source software focused on geographic information management and geospatial application building in general. As geo-professionals, having easy access to tools for our jobs is crucial. Open source software provides the opportunity to add a tool to your tool belt and carry it with you for your entire career - with no license fees, a supportive community and the opportunity to test, adopt and upgrade at your own pace. OSGeo is a US registered non-profit representing more than a dozen mature geospatial data management applications and programming resources. Tools cover areas such as desktop GIS, web-based mapping frameworks, metadata cataloging, spatial database analysis, image processing and more. Learn about some of these tools as they apply to AGU members, as well as how you can join OSGeo and its members in getting the job done with powerful open source tools. If you haven't heard of OSSIM, MapServer, OpenLayers, PostGIS, GRASS GIS or the many other projects under our umbrella - then you need to hear this talk. Invest in yourself - use open source!
77 FR 40630 - Privacy Act of 1974; System of Records

Federal Register 2010, 2011, 2012, 2013, 2014

2012-07-10

... the United States against terrorist and foreign intelligence threats and to enforce U.S. criminal laws..., informants, sources, bystanders, law enforcement personnel, intelligence personnel, other responders... intelligence operation; individuals who are identified in open source information or commercial databases, or...
Wireless access to a pharmaceutical database: a demonstrator for data driven Wireless Application Protocol (WAP) applications in medical information processing.

PubMed

Schacht Hansen, M; Dørup, J

2001-01-01

The Wireless Application Protocol technology implemented in newer mobile phones has built-in facilities for handling much of the information processing needed in clinical work. To test a practical approach we ported a relational database of the Danish pharmaceutical catalogue to Wireless Application Protocol using open source freeware at all steps. We used Apache 1.3 web software on a Linux server. Data containing the Danish pharmaceutical catalogue were imported from an ASCII file into a MySQL 3.22.32 database using a Practical Extraction and Report Language script for easy update of the database. Data were distributed in 35 interrelated tables. Each pharmaceutical brand name was given its own card with links to general information about the drug, active substances, contraindications etc. Access was available through 1) browsing therapeutic groups and 2) searching for a brand name. The database interface was programmed in the server-side scripting language PHP3. A free, open source Wireless Application Protocol gateway to a pharmaceutical catalogue was established to allow dial-in access independent of commercial Wireless Application Protocol service providers. The application was tested on the Nokia 7110 and Ericsson R320s cellular phones. We have demonstrated that Wireless Application Protocol-based access to a dynamic clinical database can be established using open source freeware. The project opens perspectives for a further integration of Wireless Application Protocol phone functions in clinical information processing: Global System for Mobile communication telephony for bilateral communication, asynchronous unilateral communication via e-mail and Short Message Service, built-in calculator, calendar, personal organizer, phone number catalogue and Dictaphone function via answering machine technology. An independent Wireless Application Protocol gateway may be placed within hospital firewalls, which may be an advantage with respect to security. However, if Wireless Application Protocol phones are to become effective tools for physicians, special attention must be paid to the limitations of the devices. Input tools of Wireless Application Protocol phones should be improved, for instance by increased use of speech control.
Wireless access to a pharmaceutical database: A demonstrator for data driven Wireless Application Protocol applications in medical information processing

PubMed Central

Hansen, Michael Schacht

2001-01-01

Background The Wireless Application Protocol technology implemented in newer mobile phones has built-in facilities for handling much of the information processing needed in clinical work. Objectives To test a practical approach we ported a relational database of the Danish pharmaceutical catalogue to Wireless Application Protocol using open source freeware at all steps. Methods We used Apache 1.3 web software on a Linux server. Data containing the Danish pharmaceutical catalogue were imported from an ASCII file into a MySQL 3.22.32 database using a Practical Extraction and Report Language script for easy update of the database. Data were distributed in 35 interrelated tables. Each pharmaceutical brand name was given its own card with links to general information about the drug, active substances, contraindications etc. Access was available through 1) browsing therapeutic groups and 2) searching for a brand name. The database interface was programmed in the server-side scripting language PHP3. Results A free, open source Wireless Application Protocol gateway to a pharmaceutical catalogue was established to allow dial-in access independent of commercial Wireless Application Protocol service providers. The application was tested on the Nokia 7110 and Ericsson R320s cellular phones. Conclusions We have demonstrated that Wireless Application Protocol-based access to a dynamic clinical database can be established using open source freeware. The project opens perspectives for a further integration of Wireless Application Protocol phone functions in clinical information processing: Global System for Mobile communication telephony for bilateral communication, asynchronous unilateral communication via e-mail and Short Message Service, built-in calculator, calendar, personal organizer, phone number catalogue and Dictaphone function via answering machine technology. An independent Wireless Application Protocol gateway may be placed within hospital firewalls, which may be an advantage with respect to security. However, if Wireless Application Protocol phones are to become effective tools for physicians, special attention must be paid to the limitations of the devices. Input tools of Wireless Application Protocol phones should be improved, for instance by increased use of speech control. PMID:11720946
Filling Terrorism Gaps: VEOs, Evaluating Databases, and Applying Risk Terrain Modeling to Terrorism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hagan, Ross F.

2016-08-29

This paper aims to address three issues: the lack of literature differentiating terrorism and violent extremist organizations (VEOs), terrorism incident databases, and the applicability of Risk Terrain Modeling (RTM) to terrorism. Current open source literature and publicly available government sources do not differentiate between terrorism and VEOs; furthermore, they fail to define them. Addressing the lack of a comprehensive comparison of existing terrorism data sources, a matrix comparing a dozen terrorism databases is constructed, providing insight toward the array of data available. RTM, a method for spatial risk analysis at a micro level, has some applicability to terrorism research, particularlymore » for studies looking at risk indicators of terrorism. Leveraging attack data from multiple databases, combined with RTM, offers one avenue for closing existing research gaps in terrorism literature.« less
OPERA: A free and open source QSAR tool for predicting physicochemical properties and environmental fate endpoints

EPA Science Inventory

Collecting the chemical structures and data for necessary QSAR modeling is facilitated by available public databases and open data. However, QSAR model performance is dependent on the quality of data and modeling methodology used. This study developed robust QSAR models for physi...
ERMes: Open Source Simplicity for Your E-Resource Management

ERIC Educational Resources Information Center

Doering, William; Chilton, Galadriel

2009-01-01

ERMes, the latest version of electronic resource management system (ERM), is a relational database; content in different tables connects to, and works with, content in other tables. ERMes requires Access 2007 (Windows) or Access 2008 (Mac) to operate as the database utilizes functionality not available in previous versions of Microsoft Access. The…
KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes.

PubMed

Koulaouzidis, Anastasios; Iakovidis, Dimitris K; Yung, Diana E; Rondonotti, Emanuele; Kopylov, Uri; Plevris, John N; Toth, Ervin; Eliakim, Abraham; Wurm Johansson, Gabrielle; Marlicz, Wojciech; Mavrogenis, Georgios; Nemeth, Artur; Thorlacius, Henrik; Tontini, Gian Eugenio

2017-06-01

Capsule endoscopy (CE) has revolutionized small-bowel (SB) investigation. Computational methods can enhance diagnostic yield (DY); however, incorporating machine learning algorithms (MLAs) into CE reading is difficult as large amounts of image annotations are required for training. Current databases lack graphic annotations of pathologies and cannot be used. A novel database, KID, aims to provide a reference for research and development of medical decision support systems (MDSS) for CE. Open-source software was used for the KID database. Clinicians contribute anonymized, annotated CE images and videos. Graphic annotations are supported by an open-access annotation tool (Ratsnake). We detail an experiment based on the KID database, examining differences in SB lesion measurement between human readers and a MLA. The Jaccard Index (JI) was used to evaluate similarity between annotations by the MLA and human readers. The MLA performed best in measuring lymphangiectasias with a JI of 81 ± 6 %. The other lesion types were: angioectasias (JI 64 ± 11 %), aphthae (JI 64 ± 8 %), chylous cysts (JI 70 ± 14 %), polypoid lesions (JI 75 ± 21 %), and ulcers (JI 56 ± 9 %). MLA can perform as well as human readers in the measurement of SB angioectasias in white light (WL). Automated lesion measurement is therefore feasible. KID is currently the only open-source CE database developed specifically to aid development of MDSS. Our experiment demonstrates this potential.
BIRS - Bioterrorism Information Retrieval System.

PubMed

Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

2013-01-01

Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. The database is freely available at http://www.bioterrorism.biowaves.org.
BioMart: a data federation framework for large collaborative projects.

PubMed

Zhang, Junjun; Haider, Syed; Baran, Joachim; Cros, Anthony; Guberman, Jonathan M; Hsu, Jack; Liang, Yong; Yao, Long; Kasprzyk, Arek

2011-01-01

BioMart is a freely available, open source, federated database system that provides a unified access to disparate, geographically distributed data sources. It is designed to be data agnostic and platform independent, such that existing databases can easily be incorporated into the BioMart framework. BioMart allows databases hosted on different servers to be presented seamlessly to users, facilitating collaborative projects between different research groups. BioMart contains several levels of query optimization to efficiently manage large data sets and offers a diverse selection of graphical user interfaces and application programming interfaces to ensure that queries can be performed in whatever manner is most convenient for the user. The software has now been adopted by a large number of different biological databases spanning a wide range of data types and providing a rich source of annotation available to bioinformaticians and biologists alike.
M2Lite: An Open-source, Light-weight, Pluggable and Fast Proteome Discoverer MSF to mzIdentML Tool.

PubMed

Aiyetan, Paul; Zhang, Bai; Chen, Lily; Zhang, Zhen; Zhang, Hui

2014-04-28

Proteome Discoverer is one of many tools used for protein database search and peptide to spectrum assignment in mass spectrometry-based proteomics. However, the inadequacy of conversion tools makes it challenging to compare and integrate its results to those of other analytical tools. Here we present M2Lite, an open-source, light-weight, easily pluggable and fast conversion tool. M2Lite converts proteome discoverer derived MSF files to the proteomics community defined standard - the mzIdentML file format. M2Lite's source code is available as open-source at https://bitbucket.org/paiyetan/m2lite/src and its compiled binaries and documentation can be freely downloaded at https://bitbucket.org/paiyetan/m2lite/downloads.
An Investigation of Graduate Student Knowledge and Usage of Open-Access Journals

ERIC Educational Resources Information Center

Beard, Regina M.

2016-01-01

Graduate students lament the need to achieve the proficiency necessary to competently search multiple databases for their research assignments, regularly eschewing these sources in favor of Google Scholar or some other search engine. The author conducted an anonymous survey investigating graduate student knowledge or awareness of the open-access…
Conversion of National Health Insurance Service-National Sample Cohort (NHIS-NSC) Database into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM).

PubMed

You, Seng Chan; Lee, Seongwon; Cho, Soo-Yeon; Park, Hojun; Jung, Sungjae; Cho, Jaehyeong; Yoon, Dukyong; Park, Rae Woong

2017-01-01

It is increasingly necessary to generate medical evidence applicable to Asian people compared to those in Western countries. Observational Health Data Sciences a Informatics (OHDSI) is an international collaborative which aims to facilitate generating high-quality evidence via creating and applying open-source data analytic solutions to a large network of health databases across countries. We aimed to incorporate Korean nationwide cohort data into the OHDSI network by converting the national sample cohort into Observational Medical Outcomes Partnership-Common Data Model (OMOP-CDM). The data of 1.13 million subjects was converted to OMOP-CDM, resulting in average 99.1% conversion rate. The ACHILLES, open-source OMOP-CDM-based data profiling tool, was conducted on the converted database to visualize data-driven characterization and access the quality of data. The OMOP-CDM version of National Health Insurance Service-National Sample Cohort (NHIS-NSC) can be a valuable tool for multiple aspects of medical research by incorporation into the OHDSI research network.

An Improved Model for Operational Specification of the Electron Density Structure up to Geosynchronous Heights

DTIC Science & Technology

2010-07-01

http://www.iono.noa.gr/ElectronDensity/EDProfile.php The web service has been developed with the following open source tools: a) PHP , for the... MySQL for the database, which was based on the enhancement of the DIAS database. Below we present some screen shots to demonstrate the functionality
openBIS ELN-LIMS: an open-source database for academic laboratories.

PubMed

Barillari, Caterina; Ottoz, Diana S M; Fuentes-Serna, Juan Mariano; Ramakrishnan, Chandrasekhar; Rinn, Bernd; Rudolf, Fabian

2016-02-15

The open-source platform openBIS (open Biology Information System) offers an Electronic Laboratory Notebook and a Laboratory Information Management System (ELN-LIMS) solution suitable for the academic life science laboratories. openBIS ELN-LIMS allows researchers to efficiently document their work, to describe materials and methods and to collect raw and analyzed data. The system comes with a user-friendly web interface where data can be added, edited, browsed and searched. The openBIS software, a user guide and a demo instance are available at https://openbis-eln-lims.ethz.ch. The demo instance contains some data from our laboratory as an example to demonstrate the possibilities of the ELN-LIMS (Ottoz et al., 2014). For rapid local testing, a VirtualBox image of the ELN-LIMS is also available. © The Author 2015. Published by Oxford University Press.
Status, upgrades, and advances of RTS2: the open source astronomical observatory manager

NASA Astrophysics Data System (ADS)

Kubánek, Petr

2016-07-01

RTS2 is an open source observatory control system. Being developed from early 2000, it continue to receive new features in last two years. RTS2 is a modulat, network-based distributed control system, featuring telescope drivers with advanced tracking and pointing capabilities, fast camera drivers and high level modules for "business logic" of the observatory, connected to a SQL database. Running on all continents of the planet, it accumulated a lot to control parts or full observatory setups.
ms_lims, a simple yet powerful open source laboratory information management system for MS-driven proteomics.

PubMed

Helsens, Kenny; Colaert, Niklaas; Barsnes, Harald; Muth, Thilo; Flikka, Kristian; Staes, An; Timmerman, Evy; Wortelkamp, Steffi; Sickmann, Albert; Vandekerckhove, Joël; Gevaert, Kris; Martens, Lennart

2010-03-01

MS-based proteomics produces large amounts of mass spectra that require processing, identification and possibly quantification before interpretation can be undertaken. High-throughput studies require automation of these various steps, and management of the data in association with the results obtained. We here present ms_lims (http://genesis.UGent.be/ms_lims), a freely available, open-source system based on a central database to automate data management and processing in MS-driven proteomics analyses.
An Open-source Toolbox for Analysing and Processing PhysioNet Databases in MATLAB and Octave.

PubMed

Silva, Ikaro; Moody, George B

The WaveForm DataBase (WFDB) Toolbox for MATLAB/Octave enables integrated access to PhysioNet's software and databases. Using the WFDB Toolbox for MATLAB/Octave, users have access to over 50 physiological databases in PhysioNet. The toolbox provides access over 4 TB of biomedical signals including ECG, EEG, EMG, and PLETH. Additionally, most signals are accompanied by metadata such as medical annotations of clinical events: arrhythmias, sleep stages, seizures, hypotensive episodes, etc. Users of this toolbox should easily be able to reproduce, validate, and compare results published based on PhysioNet's software and databases.
Quality Attribute-Guided Evaluation of NoSQL Databases: A Case Study

DTIC Science & Technology

2015-01-16

evaluations of NoSQL databases specifically, and big data systems in general, that have become apparent during our study. Keywords—NoSQL, distributed...technology, namely that of big data , software systems [1]. At the heart of big data systems are a collection of database technologies that are more...born organizations such as Google and Amazon [3][4], along with those of numerous other big data innovators, have created a variety of open source and
Update on Geothermal Direct-Use Installations in the United States

DOE Data Explorer

Beckers, Koenraad F.; Snyder, Diana M.; Young, Katherine R.

2017-03-02

An updated database of geothermal direct-use systems in the U.S. has been compiled and analyzed, building upon the Oregon Institute of Technology (OIT) Geo-Heat Center direct-use database. Types of direct-use applications examined include hot springs resorts and pools, aquaculture farms, greenhouses, and district heating systems, among others; power-generating facilities and ground-source heat pumps were excluded. Where possible, the current operation status, open and close dates, well data, and other technical data were obtained for each entry. The database contains 545 installations, of which 407 are open, 108 are closed, and 30 have an unknown status. A report is also included which details and analyzes current geothermal direct-use installations and barriers to further implementation.
KID Project: an internet-based digital video atlas of capsule endoscopy for research purposes

PubMed Central

Koulaouzidis, Anastasios; Iakovidis, Dimitris K.; Yung, Diana E.; Rondonotti, Emanuele; Kopylov, Uri; Plevris, John N.; Toth, Ervin; Eliakim, Abraham; Wurm Johansson, Gabrielle; Marlicz, Wojciech; Mavrogenis, Georgios; Nemeth, Artur; Thorlacius, Henrik; Tontini, Gian Eugenio

2017-01-01

Background and aims Capsule endoscopy (CE) has revolutionized small-bowel (SB) investigation. Computational methods can enhance diagnostic yield (DY); however, incorporating machine learning algorithms (MLAs) into CE reading is difficult as large amounts of image annotations are required for training. Current databases lack graphic annotations of pathologies and cannot be used. A novel database, KID, aims to provide a reference for research and development of medical decision support systems (MDSS) for CE. Methods Open-source software was used for the KID database. Clinicians contribute anonymized, annotated CE images and videos. Graphic annotations are supported by an open-access annotation tool (Ratsnake). We detail an experiment based on the KID database, examining differences in SB lesion measurement between human readers and a MLA. The Jaccard Index (JI) was used to evaluate similarity between annotations by the MLA and human readers. Results The MLA performed best in measuring lymphangiectasias with a JI of 81 ± 6 %. The other lesion types were: angioectasias (JI 64 ± 11 %), aphthae (JI 64 ± 8 %), chylous cysts (JI 70 ± 14 %), polypoid lesions (JI 75 ± 21 %), and ulcers (JI 56 ± 9 %). Conclusion MLA can perform as well as human readers in the measurement of SB angioectasias in white light (WL). Automated lesion measurement is therefore feasible. KID is currently the only open-source CE database developed specifically to aid development of MDSS. Our experiment demonstrates this potential. PMID:28580415
The Cardiac Atlas Project--an imaging database for computational modeling and statistical atlases of the heart.

PubMed

Fonseca, Carissa G; Backhaus, Michael; Bluemke, David A; Britten, Randall D; Chung, Jae Do; Cowan, Brett R; Dinov, Ivo D; Finn, J Paul; Hunter, Peter J; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Medrano-Gracia, Pau; Shivkumar, Kalyanam; Suinesiaputra, Avan; Tao, Wenchao; Young, Alistair A

2011-08-15

Integrative mathematical and statistical models of cardiac anatomy and physiology can play a vital role in understanding cardiac disease phenotype and planning therapeutic strategies. However, the accuracy and predictive power of such models is dependent upon the breadth and depth of noninvasive imaging datasets. The Cardiac Atlas Project (CAP) has established a large-scale database of cardiac imaging examinations and associated clinical data in order to develop a shareable, web-accessible, structural and functional atlas of the normal and pathological heart for clinical, research and educational purposes. A goal of CAP is to facilitate collaborative statistical analysis of regional heart shape and wall motion and characterize cardiac function among and within population groups. Three main open-source software components were developed: (i) a database with web-interface; (ii) a modeling client for 3D + time visualization and parametric description of shape and motion; and (iii) open data formats for semantic characterization of models and annotations. The database was implemented using a three-tier architecture utilizing MySQL, JBoss and Dcm4chee, in compliance with the DICOM standard to provide compatibility with existing clinical networks and devices. Parts of Dcm4chee were extended to access image specific attributes as search parameters. To date, approximately 3000 de-identified cardiac imaging examinations are available in the database. All software components developed by the CAP are open source and are freely available under the Mozilla Public License Version 1.1 (http://www.mozilla.org/MPL/MPL-1.1.txt). http://www.cardiacatlas.org a.young@auckland.ac.nz Supplementary data are available at Bioinformatics online.
Scale out databases for CERN use cases

NASA Astrophysics Data System (ADS)

Baranowski, Zbigniew; Grzybek, Maciej; Canali, Luca; Lanza Garcia, Daniel; Surdy, Kacper

2015-12-01

Data generation rates are expected to grow very fast for some database workloads going into LHC run 2 and beyond. In particular this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. Scale-out database technologies are a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. In this paper we will describe the architecture and tests on database systems based on Hadoop and the Cloudera Impala engine. We will discuss the results of our tests, including tests of data loading and integration with existing data sources and in particular with relational databases. We will report on query performance tests done with various data sets of interest at CERN, notably data from the accelerator log database.
cPath: open source software for collecting, storing, and querying biological pathways.

PubMed

Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

2006-11-13

Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling.
OpenFlyData: an exemplar data web integrating gene expression data on the fruit fly Drosophila melanogaster.

PubMed

Miles, Alistair; Zhao, Jun; Klyne, Graham; White-Cooper, Helen; Shotton, David

2010-10-01

Integrating heterogeneous data across distributed sources is a major requirement for in silico bioinformatics supporting translational research. For example, genome-scale data on patterns of gene expression in the fruit fly Drosophila melanogaster are widely used in functional genomic studies in many organisms to inform candidate gene selection and validate experimental results. However, current data integration solutions tend to be heavy weight, and require significant initial and ongoing investment of effort. Development of a common Web-based data integration infrastructure (a.k.a. data web), using Semantic Web standards, promises to alleviate these difficulties, but little is known about the feasibility, costs, risks or practical means of migrating to such an infrastructure. We describe the development of OpenFlyData, a proof-of-concept system integrating gene expression data on D. melanogaster, combining Semantic Web standards with light-weight approaches to Web programming based on Web 2.0 design patterns. To support researchers designing and validating functional genomic studies, OpenFlyData includes user-facing search applications providing intuitive access to and comparison of gene expression data from FlyAtlas, the BDGP in situ database, and FlyTED, using data from FlyBase to expand and disambiguate gene names. OpenFlyData's services are also openly accessible, and are available for reuse by other bioinformaticians and application developers. Semi-automated methods and tools were developed to support labour- and knowledge-intensive tasks involved in deploying SPARQL services. These include methods for generating ontologies and relational-to-RDF mappings for relational databases, which we illustrate using the FlyBase Chado database schema; and methods for mapping gene identifiers between databases. The advantages of using Semantic Web standards for biomedical data integration are discussed, as are open issues. In particular, although the performance of open source SPARQL implementations is sufficient to query gene expression data directly from user-facing applications such as Web-based data fusions (a.k.a. mashups), we found open SPARQL endpoints to be vulnerable to denial-of-service-type problems, which must be mitigated to ensure reliability of services based on this standard. These results are relevant to data integration activities in translational bioinformatics. The gene expression search applications and SPARQL endpoints developed for OpenFlyData are deployed at http://openflydata.org. FlyUI, a library of JavaScript widgets providing re-usable user-interface components for Drosophila gene expression data, is available at http://flyui.googlecode.com. Software and ontologies to support transformation of data from FlyBase, FlyAtlas, BDGP and FlyTED to RDF are available at http://openflydata.googlecode.com. SPARQLite, an implementation of the SPARQL protocol, is available at http://sparqlite.googlecode.com. All software is provided under the GPL version 3 open source license.
Concierge: Personal Database Software for Managing Digital Research Resources

PubMed Central

Sakai, Hiroyuki; Aoyama, Toshihiro; Yamaji, Kazutsuna; Usui, Shiro

2007-01-01

This article introduces a desktop application, named Concierge, for managing personal digital research resources. Using simple operations, it enables storage of various types of files and indexes them based on content descriptions. A key feature of the software is a high level of extensibility. By installing optional plug-ins, users can customize and extend the usability of the software based on their needs. In this paper, we also introduce a few optional plug-ins: literature management, electronic laboratory notebook, and XooNlps client plug-ins. XooNIps is a content management system developed to share digital research resources among neuroscience communities. It has been adopted as the standard database system in Japanese neuroinformatics projects. Concierge, therefore, offers comprehensive support from management of personal digital research resources to their sharing in open-access neuroinformatics databases such as XooNIps. This interaction between personal and open-access neuroinformatics databases is expected to enhance the dissemination of digital research resources. Concierge is developed as an open source project; Mac OS X and Windows XP versions have been released at the official site (http://concierge.sourceforge.jp). PMID:18974800
Databases and Networking for Development. The Organization of Information in Europe in the Field of Policy and Planning for Developing Countries.

ERIC Educational Resources Information Center

Lindsay, John

This work suggests that better organization of existing sources of information available in Europe and better application of these sources to training can result in improved understanding of how information systems work, and it provides an annotated list of some of these sources. The guide opens with an introduction to public policy and urban…
GMODWeb: a web framework for the generic model organism database

PubMed Central

O'Connor, Brian D; Day, Allen; Cain, Scott; Arnaiz, Olivier; Sperling, Linda; Stein, Lincoln D

2008-01-01

The Generic Model Organism Database (GMOD) initiative provides species-agnostic data models and software tools for representing curated model organism data. Here we describe GMODWeb, a GMOD project designed to speed the development of model organism database (MOD) websites. Sites created with GMODWeb provide integration with other GMOD tools and allow users to browse and search through a variety of data types. GMODWeb was built using the open source Turnkey web framework and is available from . PMID:18570664
A Brief Assessment of LC2IEDM, MIST and Web Services for use in Naval Tactical Data Management

DTIC Science & Technology

2004-07-01

server software, messaging between the client and server, and a database. The MIST database is implemented in an open source DBMS named PostGreSQL ... PostGreSQL had its beginnings at the University of California, Berkley, in 1986 [11]. The development of PostGreSQL has since evolved into a...contact history from the database. DRDC Atlantic TM 2004-148 9 Request Software Request Software Server Side Response from service
Database for Rapid Dereplication of Known Natural Products Using Data from MS and Fast NMR Experiments.

PubMed

Zani, Carlos L; Carroll, Anthony R

2017-06-23

The discovery of novel and/or new bioactive natural products from biota sources is often confounded by the reisolation of known natural products. Dereplication strategies that involve the analysis of NMR and MS spectroscopic data to infer structural features present in purified natural products in combination with database searches of these substructures provide an efficient method to rapidly identify known natural products. Unfortunately this strategy has been hampered by the lack of publically available and comprehensive natural product databases and open source cheminformatics tools. A new platform, DEREP-NP, has been developed to help solve this problem. DEREP-NP uses the open source cheminformatics program DataWarrior to generate a database containing counts of 65 structural fragments present in 229 358 natural product structures derived from plants, animals, and microorganisms, published before 2013 and freely available in the nonproprietary Universal Natural Products Database (UNPD). By counting the number of times one or more of these structural features occurs in an unknown compound, as deduced from the analysis of its NMR ( 1 H, HSQC, and/or HMBC) and/or MS data, matching structures carrying the same numeric combination of searched structural features can be retrieved from the database. Confirmation that the matching structure is the same compound can then be verified through literature comparison of spectroscopic data. This methodology can be applied to both purified natural products and fractions containing a small number of individual compounds that are often generated as screening libraries. The utility of DEREP-NP has been verified through the analysis of spectra derived from compounds (and fractions containing two or three compounds) isolated from plant, marine invertebrate, and fungal sources. DEREP-NP is freely available at https://github.com/clzani/DEREP-NP and will help to streamline the natural product discovery process.
SuML: A Survey Markup Language for Generalized Survey Encoding

PubMed Central

Barclay, MW; Lober, WB; Karras, BT

2002-01-01

There is a need in clinical and research settings for a sophisticated, generalized, web based survey tool that supports complex logic, separation of content and presentation, and computable guidelines. There are many commercial and open source survey packages available that provide simple logic; few provide sophistication beyond “goto” statements; none support the use of guidelines. These tools are driven by databases, static web pages, and structured documents using markup languages such as eXtensible Markup Language (XML). We propose a generalized, guideline aware language and an implementation architecture using open source standards.
BIRS – Bioterrorism Information Retrieval System

PubMed Central

Tewari, Ashish Kumar; Rashi; Wadhwa, Gulshan; Sharma, Sanjeev Kumar; Jain, Chakresh Kumar

2013-01-01

Bioterrorism is the intended use of pathogenic strains of microbes to widen terror in a population. There is a definite need to promote research for development of vaccines, therapeutics and diagnostic methods as a part of preparedness to any bioterror attack in the future. BIRS is an open-access database of collective information on the organisms related to bioterrorism. The architecture of database utilizes the current open-source technology viz PHP ver 5.3.19, MySQL and IIS server under windows platform for database designing. Database stores information on literature, generic- information and unique pathways of about 10 microorganisms involved in bioterrorism. This may serve as a collective repository to accelerate the drug discovery and vaccines designing process against such bioterrorist agents (microbes). The available data has been validated from various online resources and literature mining in order to provide the user with a comprehensive information system. Availability The database is freely available at http://www.bioterrorism.biowaves.org PMID:23390356
The validity of open-source data when assessing jail suicides.

PubMed

Thomas, Amanda L; Scott, Jacqueline; Mellow, Jeff

2018-05-09

The Bureau of Justice Statistics' Deaths in Custody Reporting Program is the primary source for jail suicide research, though the data is restricted from general dissemination. This study is the first to examine whether jail suicide data obtained from publicly available sources can help inform our understanding of this serious public health problem. Of the 304 suicides that were reported through the DCRP in 2009, roughly 56 percent (N = 170) of those suicides were identified through the open-source search protocol. Each of the sources was assessed based on how much information was collected on the incident and the types of variables available. A descriptive analysis was then conducted on the variables that were present in both data sources. The four variables present in each data source were: (1) demographic characteristics of the victim, (2) the location of occurrence within the facility, (3) the location of occurrence by state, and (4) the size of the facility. Findings demonstrate that the prevalence and correlates of jail suicides are extremely similar in both open-source and official data. However, for almost every variable measured, open-source data captured as much information as official data did, if not more. Further, variables not found in official data were identified in the open-source database, thus allowing researchers to have a more nuanced understanding of the situational characteristics of the event. This research provides support for the argument in favor of including open-source data in jail suicide research as it illustrates how open-source data can be used to provide additional information not originally found in official data. In sum, this research is vital in terms of possible suicide prevention, which may be directly linked to being able to manipulate environmental factors.

Tank Information System (tis): a Case Study in Migrating Web Mapping Application from Flex to Dojo for Arcgis Server and then to Open Source

NASA Astrophysics Data System (ADS)

Pulsani, B. R.

2017-11-01

Tank Information System is a web application which provides comprehensive information about minor irrigation tanks of Telangana State. As part of the program, a web mapping application using Flex and ArcGIS server was developed to make the data available to the public. In course of time as Flex be-came outdated, a migration of the client interface to the latest JavaScript based technologies was carried out. Initially, the Flex based application was migrated to ArcGIS JavaScript API using Dojo Toolkit. Both the client applications used published services from ArcGIS server. To check the migration pattern from proprietary to open source, the JavaScript based ArcGIS application was later migrated to OpenLayers and Dojo Toolkit which used published service from GeoServer. The migration pattern noticed in the study especially emphasizes upon the use of Dojo Toolkit and PostgreSQL database for ArcGIS server so that migration to open source could be performed effortlessly. The current ap-plication provides a case in study which could assist organizations in migrating their proprietary based ArcGIS web applications to open source. Furthermore, the study reveals cost benefits of adopting open source against commercial software's.
Noninvasive Fetal ECG: the PhysioNet/Computing in Cardiology Challenge 2013.

PubMed

Silva, Ikaro; Behar, Joachim; Sameni, Reza; Zhu, Tingting; Oster, Julien; Clifford, Gari D; Moody, George B

2013-03-01

The PhysioNet/CinC 2013 Challenge aimed to stimulate rapid development and improvement of software for estimating fetal heart rate (FHR), fetal interbeat intervals (FRR), and fetal QT intervals (FQT), from multichannel recordings made using electrodes placed on the mother's abdomen. For the challenge, five data collections from a variety of sources were used to compile a large standardized database, which was divided into training, open test, and hidden test subsets. Gold-standard fetal QRS and QT interval annotations were developed using a novel crowd-sourcing framework. The challenge organizers used the hidden test subset to evaluate 91 open-source software entries submitted by 53 international teams of participants in three challenge events, estimating FHR, FRR, and FQT using the hidden test subset, which was not available for study by participants. Two additional events required only user-submitted QRS annotations to evaluate FHR and FRR estimation accuracy using the open test subset available to participants. The challenge yielded a total of 91 open-source software entries. The best of these achieved average estimation errors of 187bpm 2 for FHR, 20.9 ms for FRR, and 152.7 ms for FQT. The open data sets, scoring software, and open-source entries are available at PhysioNet for researchers interested on working on these problems.
jSPyDB, an open source database-independent tool for data management

NASA Astrophysics Data System (ADS)

Pierro, Giuseppe Antonio; Cavallari, Francesca; Di Guida, Salvatore; Innocente, Vincenzo

2011-12-01

Nowadays, the number of commercial tools available for accessing Databases, built on Java or .Net, is increasing. However, many of these applications have several drawbacks: usually they are not open-source, they provide interfaces only with a specific kind of database, they are platform-dependent and very CPU and memory consuming. jSPyDB is a free web-based tool written using Python and Javascript. It relies on jQuery and python libraries, and is intended to provide a simple handler to different database technologies inside a local web browser. Such a tool, exploiting fast access libraries such as SQLAlchemy, is easy to install, and to configure. The design of this tool envisages three layers. The front-end client side in the local web browser communicates with a backend server. Only the server is able to connect to the different databases for the purposes of performing data definition and manipulation. The server makes the data available to the client, so that the user can display and handle them safely. Moreover, thanks to jQuery libraries, this tool supports export of data in different formats, such as XML and JSON. Finally, by using a set of pre-defined functions, users are allowed to create their customized views for a better data visualization. In this way, we optimize the performance of database servers by avoiding short connections and concurrent sessions. In addition, security is enforced since we do not provide users the possibility to directly execute any SQL statement.
Soil Carbon Data: long tail recovery

DOE Office of Scientific and Technical Information (OSTI.GOV)

2017-07-25

The software is intended to be part of an open source effort regarding soils data. The software provides customized data ingestion scripts for soil carbon related data sets and scripts for output databases that conform to common templates.
A global, open-source database of flood protection standards

NASA Astrophysics Data System (ADS)

Scussolini, Paolo; Aerts, Jeroen; Jongman, Brenden; Bouwer, Laurens; Winsemius, Hessel; de Moel, Hans; Ward, Philip

2016-04-01

Accurate flood risk estimation is pivotal in that it enables risk-informed policies in disaster risk reduction, as emphasized in the recent Sendai framework for Disaster Risk Reduction. To improve our understanding of flood risk, models are now capable to provide actionable risk information on the (sub)global scale. Still the accuracy of their results is greatly limited by the lack of information on standards of protection to flood that are actually in place; and researchers thus take large assumptions on the extent of protection. With our work we propose a first global, open-source database of FLOod PROtection Standards, FLOPROS, covering a range of spatial scales. FLOPROS is structured in three layers of information, and merges them into one consistent database: 1) the Design layer contains empirical information about the standard of protection presently in place; 2) the Policy layer contains intended protection standards from normative documents; 3) the Model layer uses a validated numerical approach to calculate protection standards for areas not covered in the other layers. The FLOPROS database can be used for more accurate risk assessment exercises across scales. As the database should be continually updated to reflect new interventions, we invite researchers and practitioners to contribute information. Further, we look for partners within the risk community to participate in additional strategies to implement the amount and accuracy of information contained in this first version of FLOPROS.
Simrank: Rapid and sensitive general-purpose k-mer search tool

PubMed Central

2011-01-01

Background Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project http://nihroadmap.nih.gov/hmp. Intra- and inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, open-source, flexible, stand-alone k-mer tool has not been available. Results Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings the most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, protein and human-languages found Simrank 10X to 928X faster depending on the dataset. Conclusions Simrank provides molecular ecologists with a high-throughput, open source choice for comparing large sequence sets to find similarity. PMID:21524302
Reactome graph database: Efficient access to complex pathway data

PubMed Central

Korninger, Florian; Viteri, Guilherme; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D’Eustachio, Peter

2018-01-01

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types. PMID:29377902
Reactome graph database: Efficient access to complex pathway data.

PubMed

Fabregat, Antonio; Korninger, Florian; Viteri, Guilherme; Sidiropoulos, Konstantinos; Marin-Garcia, Pablo; Ping, Peipei; Wu, Guanming; Stein, Lincoln; D'Eustachio, Peter; Hermjakob, Henning

2018-01-01

Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways. One of its main priorities is to provide easy and efficient access to its high quality curated data. At present, biological pathway databases typically store their contents in relational databases. This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data. The same data in a graph database can be queried more efficiently. Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data. The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery. The adoption of this technology greatly improved query efficiency, reducing the average query time by 93%. The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage. By adopting graph database technology we are providing a high performance pathway data resource to the community. The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
MetaboLights: An Open-Access Database Repository for Metabolomics Data.

PubMed

Kale, Namrata S; Haug, Kenneth; Conesa, Pablo; Jayseelan, Kalaivani; Moreno, Pablo; Rocca-Serra, Philippe; Nainala, Venkata Chandrasekhar; Spicer, Rachel A; Williams, Mark; Li, Xuefei; Salek, Reza M; Griffin, Julian L; Steinbeck, Christoph

2016-03-24

MetaboLights is the first general purpose, open-access database repository for cross-platform and cross-species metabolomics research at the European Bioinformatics Institute (EMBL-EBI). Based upon the open-source ISA framework, MetaboLights provides Metabolomics Standard Initiative (MSI) compliant metadata and raw experimental data associated with metabolomics experiments. Users can upload their study datasets into the MetaboLights Repository. These studies are then automatically assigned a stable and unique identifier (e.g., MTBLS1) that can be used for publication reference. The MetaboLights Reference Layer associates metabolites with metabolomics studies in the archive and is extensively annotated with data fields such as structural and chemical information, NMR and MS spectra, target species, metabolic pathways, and reactions. The database is manually curated with no specific release schedules. MetaboLights is also recommended by journals for metabolomics data deposition. This unit provides a guide to using MetaboLights, downloading experimental data, and depositing metabolomics datasets using user-friendly submission tools. Copyright © 2016 John Wiley & Sons, Inc.
Quality Analysis of Open Street Map Data

NASA Astrophysics Data System (ADS)

Wang, M.; Li, Q.; Hu, Q.; Zhou, M.

2013-05-01

Crowd sourcing geographic data is an opensource geographic data which is contributed by lots of non-professionals and provided to the public. The typical crowd sourcing geographic data contains GPS track data like OpenStreetMap, collaborative map data like Wikimapia, social websites like Twitter and Facebook, POI signed by Jiepang user and so on. These data will provide canonical geographic information for pubic after treatment. As compared with conventional geographic data collection and update method, the crowd sourcing geographic data from the non-professional has characteristics or advantages of large data volume, high currency, abundance information and low cost and becomes a research hotspot of international geographic information science in the recent years. Large volume crowd sourcing geographic data with high currency provides a new solution for geospatial database updating while it need to solve the quality problem of crowd sourcing geographic data obtained from the non-professionals. In this paper, a quality analysis model for OpenStreetMap crowd sourcing geographic data is proposed. Firstly, a quality analysis framework is designed based on data characteristic analysis of OSM data. Secondly, a quality assessment model for OSM data by three different quality elements: completeness, thematic accuracy and positional accuracy is presented. Finally, take the OSM data of Wuhan for instance, the paper analyses and assesses the quality of OSM data with 2011 version of navigation map for reference. The result shows that the high-level roads and urban traffic network of OSM data has a high positional accuracy and completeness so that these OSM data can be used for updating of urban road network database.
cPath: open source software for collecting, storing, and querying biological pathways

PubMed Central

Cerami, Ethan G; Bader, Gary D; Gross, Benjamin E; Sander, Chris

2006-01-01

Background Biological pathways, including metabolic pathways, protein interaction networks, signal transduction pathways, and gene regulatory networks, are currently represented in over 220 diverse databases. These data are crucial for the study of specific biological processes, including human diseases. Standard exchange formats for pathway information, such as BioPAX, CellML, SBML and PSI-MI, enable convenient collection of this data for biological research, but mechanisms for common storage and communication are required. Results We have developed cPath, an open source database and web application for collecting, storing, and querying biological pathway data. cPath makes it easy to aggregate custom pathway data sets available in standard exchange formats from multiple databases, present pathway data to biologists via a customizable web interface, and export pathway data via a web service to third-party software, such as Cytoscape, for visualization and analysis. cPath is software only, and does not include new pathway information. Key features include: a built-in identifier mapping service for linking identical interactors and linking to external resources; built-in support for PSI-MI and BioPAX standard pathway exchange formats; a web service interface for searching and retrieving pathway data sets; and thorough documentation. The cPath software is freely available under the LGPL open source license for academic and commercial use. Conclusion cPath is a robust, scalable, modular, professional-grade software platform for collecting, storing, and querying biological pathways. It can serve as the core data handling component in information systems for pathway visualization, analysis and modeling. PMID:17101041
Design and Analysis of a Model Reconfigurable Cyber-Exercise Laboratory (RCEL) for Information Assurance Education

DTIC Science & Technology

2004-03-01

with MySQL . This choice was made because MySQL is open source. Any significant database engine such as Oracle or MS- SQL or even MS Access can be used...10 Figure 6. The DoD vs . Commercial Life Cycle...necessarily be interested in SCADA network security 13. MySQL (Database server) – This station represents a typical data server for a web page
Establishment of a Uniform Format for Data Reporting of Structural Material Properties for Reliability Analysis

DTIC Science & Technology

1994-06-30

tip Opening Displacement (CTOD) Fracture Toughness Measurement". 48 The method has found application in the elastic-plastic fracture mechanics ( EPFM ...68 6.1 Proposed Material Property Database Format and Hierarchy .............. 68 6.2 Sample Application of the Material Property Database...the E 49.05 sub-committee. The relevant quality indicators applicable to the present program are: source of data, statistical basis of data
JNDMS Task Authorization 2 Report

DTIC Science & Technology

2013-10-01

uses Barnyard to store alarms from all DREnet Snort sensors in a MySQL database. Barnyard is an open source tool designed to work with Snort to take...Technology ITI Information Technology Infrastructure J2EE Java 2 Enterprise Edition JAR Java Archive. This is an archive file format defined by Java ...standards. JDBC Java Database Connectivity JDW JNDMS Data Warehouse JNDMS Joint Network and Defence Management System JNDMS Joint Network Defence and
Quality Attribute-Guided Evaluation of NoSQL Databases: An Experience Report

DTIC Science & Technology

2014-10-18

detailed technical evaluations of NoSQL databases specifically, and big data systems in general, that have become apparent during our study... big data , software systems [Agarwal 2011]. Internet-born organizations such as Google and Amazon are at the cutting edge of this revolution...Chang 2008], along with those of numerous other big data innovators, have made a variety of open source and commercial data management technologies
U.S. Army Research Laboratory (ARL) multimodal signatures database

NASA Astrophysics Data System (ADS)

Bennett, Kelly

2008-04-01

The U.S. Army Research Laboratory (ARL) Multimodal Signatures Database (MMSDB) is a centralized collection of sensor data of various modalities that are co-located and co-registered. The signatures include ground and air vehicles, personnel, mortar, artillery, small arms gunfire from potential sniper weapons, explosives, and many other high value targets. This data is made available to Department of Defense (DoD) and DoD contractors, Intel agencies, other government agencies (OGA), and academia for use in developing target detection, tracking, and classification algorithms and systems to protect our Soldiers. A platform independent Web interface disseminates the signatures to researchers and engineers within the scientific community. Hierarchical Data Format 5 (HDF5) signature models provide an excellent solution for the sharing of complex multimodal signature data for algorithmic development and database requirements. Many open source tools for viewing and plotting HDF5 signatures are available over the Web. Seamless integration of HDF5 signatures is possible in both proprietary computational environments, such as MATLAB, and Free and Open Source Software (FOSS) computational environments, such as Octave and Python, for performing signal processing, analysis, and algorithm development. Future developments include extending the Web interface into a portal system for accessing ARL algorithms and signatures, High Performance Computing (HPC) resources, and integrating existing database and signature architectures into sensor networking environments.
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation

PubMed Central

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at , is open for business. PMID:17916232
PAZAR: a framework for collection and dissemination of cis-regulatory sequence annotation.

PubMed

Portales-Casamar, Elodie; Kirov, Stefan; Lim, Jonathan; Lithwick, Stuart; Swanson, Magdalena I; Ticoll, Amy; Snoddy, Jay; Wasserman, Wyeth W

2007-01-01

PAZAR is an open-access and open-source database of transcription factor and regulatory sequence annotation with associated web interface and programming tools for data submission and extraction. Curated boutique data collections can be maintained and disseminated through the unified schema of the mall-like PAZAR repository. The Pleiades Promoter Project collection of brain-linked regulatory sequences is introduced to demonstrate the depth of annotation possible within PAZAR. PAZAR, located at http://www.pazar.info, is open for business.
Gramene database: navigating plant comparative genomics resources

USDA-ARS?s Scientific Manuscript database

Gramene (http://www.gramene.org) is an online, open source, curated resource for plant comparative genomics and pathway analysis designed to support researchers working in plant genomics, breeding, evolutionary biology, system biology, and metabolic engineering. It exploits phylogenetic relationship...
Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

PubMed

Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

2016-01-01

Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

ELISA-BASE: An Integrated Bioinformatics Tool for Analyzing and Tracking ELISA Microarray Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Amanda M.; Collett, James L.; Seurynck-Servoss, Shannon L.

ELISA-BASE is an open-source database for capturing, organizing and analyzing protein enzyme-linked immunosorbent assay (ELISA) microarray data. ELISA-BASE is an extension of the BioArray Soft-ware Environment (BASE) database system, which was developed for DNA microarrays. In order to make BASE suitable for protein microarray experiments, we developed several plugins for importing and analyzing quantitative ELISA microarray data. Most notably, our Protein Microarray Analysis Tool (ProMAT) for processing quantita-tive ELISA data is now available as a plugin to the database.
DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra

PubMed Central

2013-01-01

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440
DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

PubMed

Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

2014-02-07

De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .
Defense Against National Vulnerabilities in Public Data

DTIC Science & Technology

2017-02-28

ingestion of subscription based precision data sources ( Business Intelligence Databases, Monster, others).  Flexible data architecture that allows for... Architecture Objective: Develop a data acquisition architecture that can successfully ingest 1,000,000 records per hour from up to 100 different open...data sources.  Developed and operate a data acquisition architecture comprised of the four following major components:  Robust website
Sources of Free and Open Source Spatial Data for Natural Disasters and Principles for Use in Developing Country Contexts

NASA Astrophysics Data System (ADS)

Taylor, Faith E.; Malamud, Bruce D.; Millington, James D. A.

2016-04-01

Access to reliable spatial and quantitative datasets (e.g., infrastructure maps, historical observations, environmental variables) at regional and site specific scales can be a limiting factor for understanding hazards and risks in developing country settings. Here we present a 'living database' of >75 freely available data sources relevant to hazard and risk in Africa (and more globally). Data sources include national scientific foundations, non-governmental bodies, crowd-sourced efforts, academic projects, special interest groups and others. The database is available at http://tinyurl.com/africa-datasets and is continually being updated, particularly in the context of broader natural hazards research we are doing in the context of Malawi and Kenya. For each data source, we review the spatiotemporal resolution and extent and make our own assessments of reliability and usability of datasets. Although such freely available datasets are sometimes presented as a panacea to improving our understanding of hazards and risk in developing countries, there are both pitfalls and opportunities unique to using this type of freely available data. These include factors such as resolution, homogeneity, uncertainty, access to metadata and training for usage. Based on our experience, use in the field and grey/peer-review literature, we present a suggested set of guidelines for using these free and open source data in developing country contexts.
Developing a GIS for CO2 analysis using lightweight, open source components

NASA Astrophysics Data System (ADS)

Verma, R.; Goodale, C. E.; Hart, A. F.; Kulawik, S. S.; Law, E.; Osterman, G. B.; Braverman, A.; Nguyen, H. M.; Mattmann, C. A.; Crichton, D. J.; Eldering, A.; Castano, R.; Gunson, M. R.

2012-12-01

There are advantages to approaching the realm of geographic information systems (GIS) using lightweight, open source components in place of a more traditional web map service (WMS) solution. Rapid prototyping, schema-less data storage, the flexible interchange of components, and open source community support are just some of the benefits. In our effort to develop an application supporting the geospatial and temporal rendering of remote sensing carbon-dioxide (CO2) data for the CO2 Virtual Science Data Environment project, we have connected heterogeneous open source components together to form a GIS. Utilizing widely popular open source components including the schema-less database MongoDB, Leaflet interactive maps, the HighCharts JavaScript graphing library, and Python Bottle web-services, we have constructed a system for rapidly visualizing CO2 data with reduced up-front development costs. These components can be aggregated together, resulting in a configurable stack capable of replicating features provided by more standard GIS technologies. The approach we have taken is not meant to replace the more established GIS solutions, but to instead offer a rapid way to provide GIS features early in the development of an application and to offer a path towards utilizing more capable GIS technology in the future.
Biological knowledge bases using Wikis: combining the flexibility of Wikis with the structure of databases.

PubMed

Brohée, Sylvain; Barriot, Roland; Moreau, Yves

2010-09-01

In recent years, the number of knowledge bases developed using Wiki technology has exploded. Unfortunately, next to their numerous advantages, classical Wikis present a critical limitation: the invaluable knowledge they gather is represented as free text, which hinders their computational exploitation. This is in sharp contrast with the current practice for biological databases where the data is made available in a structured way. Here, we present WikiOpener an extension for the classical MediaWiki engine that augments Wiki pages by allowing on-the-fly querying and formatting resources external to the Wiki. Those resources may provide data extracted from databases or DAS tracks, or even results returned by local or remote bioinformatics analysis tools. This also implies that structured data can be edited via dedicated forms. Hence, this generic resource combines the structure of biological databases with the flexibility of collaborative Wikis. The source code and its documentation are freely available on the MediaWiki website: http://www.mediawiki.org/wiki/Extension:WikiOpener.
Cloud storage based mobile assessment facility for patients with post-traumatic stress disorder using integrated signal processing algorithm

NASA Astrophysics Data System (ADS)

Balbin, Jessie R.; Pinugu, Jasmine Nadja J.; Basco, Abigail Joy S.; Cabanada, Myla B.; Gonzales, Patrisha Melrose V.; Marasigan, Juan Carlos C.

2017-06-01

The research aims to build a tool in assessing patients for post-traumatic stress disorder or PTSD. The parameters used are heart rate, skin conductivity, and facial gestures. Facial gestures are recorded using OpenFace, an open-source face recognition program that uses facial action units in to track facial movements. Heart rate and skin conductivity is measured through sensors operated using Raspberry Pi. Results are stored in a database for easy and quick access. Databases to be used are uploaded to a cloud platform so that doctors have direct access to the data. This research aims to analyze these parameters and give accurate assessment of the patient.
The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases

PubMed Central

Orchard, Sandra; Ammari, Mais; Aranda, Bruno; Breuza, Lionel; Briganti, Leonardo; Broackes-Carter, Fiona; Campbell, Nancy H.; Chavali, Gayatri; Chen, Carol; del-Toro, Noemi; Duesbury, Margaret; Dumousseau, Marine; Galeota, Eugenia; Hinz, Ursula; Iannuccelli, Marta; Jagannathan, Sruthi; Jimenez, Rafael; Khadake, Jyoti; Lagreid, Astrid; Licata, Luana; Lovering, Ruth C.; Meldal, Birgit; Melidoni, Anna N.; Milagros, Mila; Peluso, Daniele; Perfetto, Livia; Porras, Pablo; Raghunath, Arathi; Ricard-Blum, Sylvie; Roechert, Bernd; Stutz, Andre; Tognolli, Michael; van Roey, Kim; Cesareni, Gianni; Hermjakob, Henning

2014-01-01

IntAct (freely available at http://www.ebi.ac.uk/intact) is an open-source, open data molecular interaction database populated by data either curated from the literature or from direct data depositions. IntAct has developed a sophisticated web-based curation tool, capable of supporting both IMEx- and MIMIx-level curation. This tool is now utilized by multiple additional curation teams, all of whom annotate data directly into the IntAct database. Members of the IntAct team supply appropriate levels of training, perform quality control on entries and take responsibility for long-term data maintenance. Recently, the MINT and IntAct databases decided to merge their separate efforts to make optimal use of limited developer resources and maximize the curation output. All data manually curated by the MINT curators have been moved into the IntAct database at EMBL-EBI and are merged with the existing IntAct dataset. Both IntAct and MINT are active contributors to the IMEx consortium (http://www.imexconsortium.org). PMID:24234451
National security and national competitiveness: Open source solutions; NASA requirements and capabilities

NASA Technical Reports Server (NTRS)

Cotter, Gladys A.

1993-01-01

Foreign competitors are challenging the world leadership of the U.S. aerospace industry, and increasingly tight budgets everywhere make international cooperation in aerospace science necessary. The NASA STI Program has as part of its mission to support NASA R&D, and to that end has developed a knowledge base of aerospace-related information known as the NASA Aerospace Database. The NASA STI Program is already involved in international cooperation with NATO/AGARD/TIP, CENDI, ICSU/ICSTI, and the U.S. Japan Committee on STI. With the new more open political climate, the perceived dearth of foreign information in the NASA Aerospace Database, and the development of the ESA database and DELURA, the German databases, the NASA STI Program is responding by sponsoring workshops on foreign acquisitions and by increasing its cooperation with international partners and with other U.S. agencies. The STI Program looks to the future of improved database access through networking and a GUI; new media; optical disk, video, and full text; and a Technology Focus Group that will keep the NASA STI Program current with technology.
Drug interaction databases in medical literature: transparency of ownership, funding, classification algorithms, level of documentation, and staff qualifications. A systematic review.

PubMed

Kongsholm, Gertrud Gansmo; Nielsen, Anna Katrine Toft; Damkier, Per

2015-11-01

It is well documented that drug-drug interaction databases (DIDs) differ substantially with respect to classification of drug-drug interactions (DDIs). The aim of this study was to study online available transparency of ownership, funding, information, classifications, staff training, and underlying documentation of the five most commonly used open access English language-based online DIDs and the three most commonly used subscription English language-based online DIDs in the literature. We conducted a systematic literature search to identify the five most commonly used open access and the three most commonly used subscription DIDs in the medical literature. The following parameters were assessed for each of the databases: Ownership, classification of interactions, primary information sources, and staff qualification. We compared the overall proportion of yes/no answers from open access databases and subscription databases by Fisher's exact test-both prior to and after requesting missing information. Among open access DIDs, 20/60 items could be verified from the webpage directly compared to 24/36 for the subscription DIDs (p = 0.0028). Following personal request, these numbers rose to 22/60 and 30/36, respectively (p < 0.0001). For items within the "classification of interaction" domain, proportions were 3/25 versus 11/15 available from the webpage (P = 0.0001) and 3/25 versus 15/15 (p < 0.0001) available upon personal request. Available information on online available transparency of ownership, funding, information, classifications, staff training, and underlying documentation varies substantially among various DIDs. Open access DIDs had a statistically lower score on parameters assessed.
A collection of open source applications for mass spectrometry data mining.

PubMed

Gallardo, Óscar; Ovelleiro, David; Gay, Marina; Carrascal, Montserrat; Abian, Joaquin

2014-10-01

We present several bioinformatics applications for the identification and quantification of phosphoproteome components by MS. These applications include a front-end graphical user interface that combines several Thermo RAW formats to MASCOT™ Generic Format extractors (EasierMgf), two graphical user interfaces for search engines OMSSA and SEQUEST (OmssaGui and SequestGui), and three applications, one for the management of databases in FASTA format (FastaTools), another for the integration of search results from up to three search engines (Integrator), and another one for the visualization of mass spectra and their corresponding database search results (JsonVisor). These applications were developed to solve some of the common problems found in proteomic and phosphoproteomic data analysis and were integrated in the workflow for data processing and feeding on our LymPHOS database. Applications were designed modularly and can be used standalone. These tools are written in Perl and Python programming languages and are supported on Windows platforms. They are all released under an Open Source Software license and can be freely downloaded from our software repository hosted at GoogleCode. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The Ensembl genome database project.

PubMed

Hubbard, T; Barker, D; Birney, E; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Huminiecki, L; Kasprzyk, A; Lehvaslaiho, H; Lijnzaad, P; Melsopp, C; Mongin, E; Pettett, R; Pocock, M; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Clamp, M

2002-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of the human genome sequence, with confirmed gene predictions that have been integrated with external data sources, and is available as either an interactive web site or as flat files. It is also an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements from sequence analysis to data storage and visualisation. The Ensembl site is one of the leading sources of human genome sequence annotation and provided much of the analysis for publication by the international human genome project of the draft genome. The Ensembl system is being installed around the world in both companies and academic sites on machines ranging from supercomputers to laptops.
Methods, Knowledge Support, and Experimental Tools for Modeling

DTIC Science & Technology

2006-10-01

open source software entities: the PostgreSQL relational database management system (http://www.postgres.org), the Apache web server (http...past. The revision control system allows the program to capture disagreements, and allows users to explore the history of such disagreements by
Evaluating Open-Source Full-Text Search Engines for Matching ICD-10 Codes.

PubMed

Jurcău, Daniel-Alexandru; Stoicu-Tivadar, Vasile

2016-01-01

This research presents the results of evaluating multiple free, open-source engines on matching ICD-10 diagnostic codes via full-text searches. The study investigates what it takes to get an accurate match when searching for a specific diagnostic code. For each code the evaluation starts by extracting the words that make up its text and continues with building full-text search queries from the combinations of these words. The queries are then run against all the ICD-10 codes until a match indicates the code in question as a match with the highest relative score. This method identifies the minimum number of words that must be provided in order for the search engines choose the desired entry. The engines analyzed include a popular Java-based full-text search engine, a lightweight engine written in JavaScript which can even execute on the user's browser, and two popular open-source relational database management systems.
For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases.

PubMed

Liljekvist, Mads Svane; Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

2015-01-01

Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases' criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ's coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general.
The Role of Citizen Science in Risk Mitigation and Disaster Response: A Case Study of 2015 Nepalese Earthquake Using OpenStreetMap

NASA Astrophysics Data System (ADS)

Rieger, C.; Byrne, J. M.

2015-12-01

Citizen science includes networks of ordinary people acting as sensors, observing and recording information for science. OpenStreetMap is one such sensor network which empowers citizens to collaboratively produce a global picture from free geographic information. The success of this open source software is extended by the development of freely used open databases for the user community. Participating citizens do not require a high level of skill. Final results are processed by professionals following quality assurance protocols before map information is released. OpenStreetMap is not only the cheapest source of timely maps in many cases but also often the only source. This is particularly true in developing countries. Emergency responses to the recent earthquake in Nepal illustrates the value for rapidly updated geographical information. This includes emergency management, damage assessment, post-disaster response, and future risk mitigation. Local disaster conditions (landslides, road closings, bridge failures, etc.) were documented for local aid workers by citizen scientists working remotely. Satellites and drones provide digital imagery of the disaster zone and OpenStreetMap participants shared the data from locations around the globe. For the Nepal earthquake, OpenStreetMap provided a team of volunteers on the ground through their Humanitarian OpenStreetMap Team (HOT) which contribute data to the disaster response through smartphones and laptops. This, combined with global citizen science efforts, provided immediate geographically useful maps to assist aid workers, including the Red Cross and Canadian DART Team, and the Nepalese government. As of August 2014, almost 1.7 million users provided over 2.5 billion edits to the OpenStreetMap map database. Due to the increased usage of smartphones, GPS-enabled devices, and the growing participation in citizen science projects, data gathering is proving an effective way to contribute as a global citizen. This paper aims to describe the significance of citizen participation in the case of the Nepal earthquake using OpenStreetMap to respond to disasters as well as its role in future risk mitigation.
Reengineering Workflow for Curation of DICOM Datasets.

PubMed

Bennett, William; Smith, Kirk; Jarosz, Quasar; Nolan, Tracy; Bosch, Walter

2018-06-15

Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the "Posda Tools." The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools . In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using "nicknames" which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual.
WE-D-9A-06: Open Source Monitor Calibration and Quality Control Software for Enterprise Display Management

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bevins, N; Vanderhoek, M; Lang, S

2014-06-15

Purpose: Medical display monitor calibration and quality control present challenges to medical physicists. The purpose of this work is to demonstrate and share experiences with an open source package that allows for both initial monitor setup and routine performance evaluation. Methods: A software package, pacsDisplay, has been developed over the last decade to aid in the calibration of all monitors within the radiology group in our health system. The software is used to calibrate monitors to follow the DICOM Grayscale Standard Display Function (GSDF) via lookup tables installed on the workstation. Additional functionality facilitates periodic evaluations of both primary andmore » secondary medical monitors to ensure satisfactory performance. This software is installed on all radiology workstations, and can also be run as a stand-alone tool from a USB disk. Recently, a database has been developed to store and centralize the monitor performance data and to provide long-term trends for compliance with internal standards and various accrediting organizations. Results: Implementation and utilization of pacsDisplay has resulted in improved monitor performance across the health system. Monitor testing is now performed at regular intervals and the software is being used across multiple imaging modalities. Monitor performance characteristics such as maximum and minimum luminance, ambient luminance and illuminance, color tracking, and GSDF conformity are loaded into a centralized database for system performance comparisons. Compliance reports for organizations such as MQSA, ACR, and TJC are generated automatically and stored in the same database. Conclusion: An open source software solution has simplified and improved the standardization of displays within our health system. This work serves as an example method for calibrating and testing monitors within an enterprise health system.« less
Database usage and performance for the Fermilab Run II experiments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bonham, D.; Box, D.; Gallas, E.

2004-12-01

The Run II experiments at Fermilab, CDF and D0, have extensive database needs covering many areas of their online and offline operations. Delivering data to users and processing farms worldwide has represented major challenges to both experiments. The range of applications employing databases includes, calibration (conditions), trigger information, run configuration, run quality, luminosity, data management, and others. Oracle is the primary database product being used for these applications at Fermilab and some of its advanced features have been employed, such as table partitioning and replication. There is also experience with open source database products such as MySQL for secondary databasesmore » used, for example, in monitoring. Tools employed for monitoring the operation and diagnosing problems are also described.« less

CVcat: An interactive database on cataclysmic variables

NASA Astrophysics Data System (ADS)

Kube, J.; Gänsicke, B. T.; Euchner, F.; Hoffmann, B.

2003-06-01

CVcat is a database that contains published data on cataclysmic variables and related objects. Unlike in the existing online sources, the users are allowed to add data to the catalogue. The concept of an ``open catalogue'' approach is reviewed together with the experience from one year of public usage of CVcat. New concepts to be included in the upcoming AstroCat framework and the next CVcat implementation are presented. CVcat can be found at http://www.cvcat.org.
Quaternary Geology and Liquefaction Susceptibility, San Francisco, California 1:100,000 Quadrangle: A Digital Database

USGS Publications Warehouse

Knudsen, Keith L.; Noller, Jay S.; Sowers, Janet M.; Lettis, William R.

1997-01-01

This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There are no paper maps included in the Open-File report. The report does include, however, PostScript plot files containing the images of the geologic map sheets with explanations, as well as the accompanying text describing the geology of the area. For those interested in a paper plot of information contained in the database or in obtaining the PostScript plot files, please see the section entitled 'For Those Who Aren't Familiar With Digital Geologic Map Databases' below. This digital map database, compiled from previously unpublished data, and new mapping by the authors, represents the general distribution of surficial deposits in the San Francisco bay region. Together with the accompanying text file (sf_geo.txt or sf_geo.pdf), it provides current information on Quaternary geology and liquefaction susceptibility of the San Francisco, California, 1:100,000 quadrangle. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:100,000 or smaller. The content and character of the database, as well as three methods of obtaining the database, are described below.
Neuroimaging Data Sharing on the Neuroinformatics Database Platform

PubMed Central

Book, Gregory A; Stevens, Michael; Assaf, Michal; Glahn, David; Pearlson, Godfrey D

2015-01-01

We describe the Neuroinformatics Database (NiDB), an open-source database platform for archiving, analysis, and sharing of neuroimaging data. Data from the multi-site projects Autism Brain Imaging Data Exchange (ABIDE), Bipolar-Schizophrenia Network on Intermediate Phenotypes parts one and two (B-SNIP1, B-SNIP2), and Monetary Incentive Delay task (MID) are available for download from the public instance of NiDB, with more projects sharing data as it becomes available. As demonstrated by making several large datasets available, NiDB is an extensible platform appropriately suited to archive and distribute shared neuroimaging data. PMID:25888923
The Open Spectral Database: an open platform for sharing and searching spectral data.

PubMed

Chalk, Stuart J

2016-01-01

A number of websites make available spectral data for download (typically as JCAMP-DX text files) and one (ChemSpider) that also allows users to contribute spectral files. As a result, searching and retrieving such spectral data can be time consuming, and difficult to reuse if the data is compressed in the JCAMP-DX file. What is needed is a single resource that allows submission of JCAMP-DX files, export of the raw data in multiple formats, searching based on multiple chemical identifiers, and is open in terms of license and access. To address these issues a new online resource called the Open Spectral Database (OSDB) http://osdb.info/ has been developed and is now available. Built using open source tools, using open code (hosted on GitHub), providing open data, and open to community input about design and functionality, the OSDB is available for anyone to submit spectral data, making it searchable and available to the scientific community. This paper details the concept and coding, internal architecture, export formats, Representational State Transfer (REST) Application Programming Interface and options for submission of data. The OSDB website went live in November 2015. Concurrently, the GitHub repository was made available at https://github.com/stuchalk/OSDB/, and is open for collaborators to join the project, submit issues, and contribute code. The combination of a scripting environment (PHPStorm), a PHP Framework (CakePHP), a relational database (MySQL) and a code repository (GitHub) provides all the capabilities to easily develop REST based websites for ingestion, curation and exposure of open chemical data to the community at all levels. It is hoped this software stack (or equivalent ones in other scripting languages) will be leveraged to make more chemical data available for both humans and computers.
Windows Memory Forensic Data Visualization

DTIC Science & Technology

2014-06-12

clustering characteristics (Bastian, et al, 2009). The software is written in Java and utilizes the OpenGL library for rendering graphical content...Toolkit 2 nd ed. Burlington MA: Syngress. D3noob. (2013, February 8). Using a MYSQL database as a source of data. Message posted to http
SNPversity: A web-based tool for visualizing diversity

USDA-ARS?s Scientific Manuscript database

Background: Many stand-alone desktop software suites exist to visualize single nucleotide polymorphisms (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualizat...
Geology of Point Reyes National Seashore and vicinity, California: a digital database

USGS Publications Warehouse

Clark, Jospeh C.; Brabb, Earl E.

1997-01-01

This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. The report does include, however, a PostScript plot file containing an image of the geologic map sheet with explanation, as well as the accompanying text describing the geology of the area. For those interested in a paper plot of information contained in the database or in obtaining the PostScript plot files, please see the section entitled 'For Those Who Aren't Familiar With Digital Geologic Map Databases' below. This digital map database, compiled from previously published and unpublished data and new mapping by the authors, represents the general distribution of surficial deposits and rock units in Point Reyes and surrounding areas. Together with the accompanying text file (pr-geo.txt or pr-geo.ps), it provides current information on the stratigraphy and structural geology of the area covered. The database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U.S. Geological Survey. The scale of the source maps limits the spatial resolution (scale) of the database to 1:48,000 or smaller.
Towards open collaborative health informatics - The Role of free/libre open source principles. Contribution of the IMIA Open Source Health Informatics Working Group.

PubMed

Karopka, T; Schmuhl, H; Marcelo, A; Molin, J Dal; Wright, G

2011-01-01

: To analyze the contribution of Free/Libre Open Source Software in health care (FLOSS-HC) and to give perspectives for future developments. The paper summarizes FLOSS-related trends in health care as anticipated by members of the IMIA Open Source Working Group. Data were obtained through literature review and personal experience and observations of the authors in the last two decades. A status quo is given by a frequency analysis of the database of Medfloss.org, one of the world's largest platforms dedicated to FLOSS-HC. The authors discuss current problems in the field of health care and finally give a prospective roadmap, a projection of the potential influences of FLOSS in health care. FLOSS-HC already exists for more than 2 decades. Several projects have shown that FLOSS may produce highly competitive alternatives to proprietary solutions that are at least equivalent in usability and have a better total cost of ownership ratio. The Medfloss.org database currently lists 221 projects of diverse application types. FLOSS principles hold a great potential for addressing several of the most critical problems in health care IT. The authors argue that an ecosystem perspective is relevant and that FLOSS principles are best suited to create health IT systems that are able to evolve over time as medical knowledge, technologies, insights, workflows etc. continuously change. All these factors that inherently influence the development of health IT systems are changing at an ever growing pace. Traditional models of software engineering are not able to follow these changes and provide up-to-date systems for an acceptable cost/value ratio. To allow FLOSS to positively influence Health IT in the future a "FLOSS-friendly" environment has to be provided. Policy makers should resolve uncertainties in the legal framework that disfavor FLOSS. Certification procedures should be specified in a way that they do not raise additional barriers for FLOSS.
Introduction to an Open Source Internet-Based Testing Program for Medical Student Examinations

PubMed Central

2009-01-01

The author developed a freely available open source internet-based testing program for medical examination. PHP and Java script were used as the programming language and postgreSQL as the database management system on an Apache web server and Linux operating system. The system approach was that a super user inputs the items, each school administrator inputs the examinees' information, and examinees access the system. The examinee's score is displayed immediately after examination with item analysis. The set-up of the system beginning with installation is described. This may help medical professors to easily adopt an internet-based testing system for medical education. PMID:20046457
Introduction to an open source internet-based testing program for medical student examinations.

PubMed

Lee, Yoon-Hwan

2009-12-20

The author developed a freely available open source internet-based testing program for medical examination. PHP and Java script were used as the programming language and postgreSQL as the database management system on an Apache web server and Linux operating system. The system approach was that a super user inputs the items, each school administrator inputs the examinees' information, and examinees access the system. The examinee's score is displayed immediately after examination with item analysis. The set-up of the system beginning with installation is described. This may help medical professors to easily adopt an internet-based testing system for medical education.
For 481 biomedical open access journals, articles are not searchable in the Directory of Open Access Journals nor in conventional biomedical databases

PubMed Central

Andresen, Kristoffer; Pommergaard, Hans-Christian; Rosenberg, Jacob

2015-01-01

Background. Open access (OA) journals allows access to research papers free of charge to the reader. Traditionally, biomedical researchers use databases like MEDLINE and EMBASE to discover new advances. However, biomedical OA journals might not fulfill such databases’ criteria, hindering dissemination. The Directory of Open Access Journals (DOAJ) is a database exclusively listing OA journals. The aim of this study was to investigate DOAJ’s coverage of biomedical OA journals compared with the conventional biomedical databases. Methods. Information on all journals listed in four conventional biomedical databases (MEDLINE, PubMed Central, EMBASE and SCOPUS) and DOAJ were gathered. Journals were included if they were (1) actively publishing, (2) full OA, (3) prospectively indexed in one or more database, and (4) of biomedical subject. Impact factor and journal language were also collected. DOAJ was compared with conventional databases regarding the proportion of journals covered, along with their impact factor and publishing language. The proportion of journals with articles indexed by DOAJ was determined. Results. In total, 3,236 biomedical OA journals were included in the study. Of the included journals, 86.7% were listed in DOAJ. Combined, the conventional biomedical databases listed 75.0% of the journals; 18.7% in MEDLINE; 36.5% in PubMed Central; 51.5% in SCOPUS and 50.6% in EMBASE. Of the journals in DOAJ, 88.7% published in English and 20.6% had received impact factor for 2012 compared with 93.5% and 26.0%, respectively, for journals in the conventional biomedical databases. A subset of 51.1% and 48.5% of the journals in DOAJ had articles indexed from 2012 and 2013, respectively. Of journals exclusively listed in DOAJ, one journal had received an impact factor for 2012, and 59.6% of the journals had no content from 2013 indexed in DOAJ. Conclusions. DOAJ is the most complete registry of biomedical OA journals compared with five conventional biomedical databases. However, DOAJ only indexes articles for half of the biomedical journals listed, making it an incomplete source for biomedical research papers in general. PMID:26038727
CGDSNPdb: a database resource for error-checked and imputed mouse SNPs.

PubMed

Hutchins, Lucie N; Ding, Yueming; Szatkiewicz, Jin P; Von Smith, Randy; Yang, Hyuna; de Villena, Fernando Pardo-Manuel; Churchill, Gary A; Graber, Joel H

2010-07-06

The Center for Genome Dynamics Single Nucleotide Polymorphism Database (CGDSNPdb) is an open-source value-added database with more than nine million mouse single nucleotide polymorphisms (SNPs), drawn from multiple sources, with genotypes assigned to multiple inbred strains of laboratory mice. All SNPs are checked for accuracy and annotated for properties specific to the SNP as well as those implied by changes to overlapping protein-coding genes. CGDSNPdb serves as the primary interface to two unique data sets, the 'imputed genotype resource' in which a Hidden Markov Model was used to assess local haplotypes and the most probable base assignment at several million genomic loci in tens of strains of mice, and the Affymetrix Mouse Diversity Genotyping Array, a high density microarray with over 600,000 SNPs and over 900,000 invariant genomic probes. CGDSNPdb is accessible online through either a web-based query tool or a MySQL public login. Database URL: http://cgd.jax.org/cgdsnpdb/
jqcML: an open-source java API for mass spectrometry quality control data in the qcML format.

PubMed

Bittremieux, Wout; Kelchtermans, Pieter; Valkenborg, Dirk; Martens, Lennart; Laukens, Kris

2014-07-03

The awareness that systematic quality control is an essential factor to enable the growth of proteomics into a mature analytical discipline has increased over the past few years. To this aim, a controlled vocabulary and document structure have recently been proposed by Walzer et al. to store and disseminate quality-control metrics for mass-spectrometry-based proteomics experiments, called qcML. To facilitate the adoption of this standardized quality control routine, we introduce jqcML, a Java application programming interface (API) for the qcML data format. First, jqcML provides a complete object model to represent qcML data. Second, jqcML provides the ability to read, write, and work in a uniform manner with qcML data from different sources, including the XML-based qcML file format and the relational database qcDB. Interaction with the XML-based file format is obtained through the Java Architecture for XML Binding (JAXB), while generic database functionality is obtained by the Java Persistence API (JPA). jqcML is released as open-source software under the permissive Apache 2.0 license and can be downloaded from https://bitbucket.org/proteinspector/jqcml .
Global ISR: Toward a Comprehensive Defense Against Unauthorized Code Execution

DTIC Science & Technology

2010-10-01

implementation using two of the most popular open- source servers: the Apache web server, and the MySQL database server. For Apache, we measure the effect that...utility ab. T o ta l T im e ( s e c ) 0 500 1000 1500 2000 2500 3000 Native Null ISR ISR−MP Fig. 3. The MySQL test-insert bench- mark measures...various SQL operations. The figure draws total execution time as reported by the benchmark utility. Finally, we benchmarked a MySQL database server using
No Permit Needed for Emergency-Related Opening Burning on Tribal Lands

EPA Pesticide Factsheets

This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.
Consideration of Fugitives in Open-Air Cattle Operations

EPA Pesticide Factsheets

This document may be of assistance in applying the New Source Review (NSR) air permitting regulations including the Prevention of Significant Deterioration (PSD) requirements. This document is part of the NSR Policy and Guidance Database. Some documents in the database are a scanned or retyped version of a paper photocopy of the original. Although we have taken considerable effort to quality assure the documents, some may contain typographical errors. Contact the office that issued the document if you need a copy of the original.
An open-source framework for stress-testing non-invasive foetal ECG extraction algorithms.

PubMed

Andreotti, Fernando; Behar, Joachim; Zaunseder, Sebastian; Oster, Julien; Clifford, Gari D

2016-05-01

Over the past decades, many studies have been published on the extraction of non-invasive foetal electrocardiogram (NI-FECG) from abdominal recordings. Most of these contributions claim to obtain excellent results in detecting foetal QRS (FQRS) complexes in terms of location. A small subset of authors have investigated the extraction of morphological features from the NI-FECG. However, due to the shortage of available public databases, the large variety of performance measures employed and the lack of open-source reference algorithms, most contributions cannot be meaningfully assessed. This article attempts to address these issues by presenting a standardised methodology for stress testing NI-FECG algorithms, including absolute data, as well as extraction and evaluation routines. To that end, a large database of realistic artificial signals was created, totaling 145.8 h of multichannel data and over one million FQRS complexes. An important characteristic of this dataset is the inclusion of several non-stationary events (e.g. foetal movements, uterine contractions and heart rate fluctuations) that are critical for evaluating extraction routines. To demonstrate our testing methodology, three classes of NI-FECG extraction algorithms were evaluated: blind source separation (BSS), template subtraction (TS) and adaptive methods (AM). Experiments were conducted to benchmark the performance of eight NI-FECG extraction algorithms on the artificial database focusing on: FQRS detection and morphological analysis (foetal QT and T/QRS ratio). The overall median FQRS detection accuracies (i.e. considering all non-stationary events) for the best performing methods in each group were 99.9% for BSS, 97.9% for AM and 96.0% for TS. Both FQRS detections and morphological parameters were shown to heavily depend on the extraction techniques and signal-to-noise ratio. Particularly, it is shown that their evaluation in the source domain, obtained after using a BSS technique, should be avoided. Data, extraction algorithms and evaluation routines were released as part of the fecgsyn toolbox on Physionet under an GNU GPL open-source license. This contribution provides a standard framework for benchmarking and regulatory testing of NI-FECG extraction algorithms.
A New Global Open Source Marine Hydrocarbon Emission Site Database

NASA Astrophysics Data System (ADS)

Onyia, E., Jr.; Wood, W. T.; Barnard, A.; Dada, T.; Qazzaz, M.; Lee, T. R.; Herrera, E.; Sager, W.

2017-12-01

Hydrocarbon emission sites (e.g. seeps) discharge large volumes of fluids and gases into the oceans that are not only important for biogeochemical budgets, but also support abundant chemosynthetic communities. Documenting the locations of modern emissions is a first step towards understanding and monitoring how they affect the global state of the seafloor and oceans. Currently, no global open source (i.e. non-proprietry) detailed maps of emissions sites are available. As a solution, we have created a database that is housed within an Excel spreadsheet and use the latest versions of Earthpoint and Google Earth for position coordinate conversions and data mapping, respectively. To date, approximately 1,000 data points have been collected from referenceable sources across the globe, and we are continualy expanding the dataset. Due to the variety of spatial extents encountered, to identify each site we used two different methods: 1) point (x, y, z) locations for individual sites and; 2) delineation of areas where sites are clustered. Certain well-known areas, such as the Gulf of Mexico and the Mediterranean Sea, have a greater abundance of information; whereas significantly less information is available in other regions due to the absence of emission sites, lack of data, or because the existing data is proprietary. Although the geographical extent of the data is currently restricted to regions where the most data is publicly available, as the database matures, we expect to have more complete coverage of the world's oceans. This database is an information resource that consolidates and organizes the existing literature on hydrocarbons released into the marine environment, thereby providing a comprehensive reference for future work. We expect that the availability of seafloor hydrocarbon emission maps will benefit scientific understanding of hydrocarbon rich areas as well as potentially aiding hydrocarbon exploration and environmental impact assessements.
The database on transgenic luminescent microorganisms as an instrument of studying a microbial component of closed ecosystems

NASA Astrophysics Data System (ADS)

Boyandin, A. N.; Lankin, Y. P.; Kargatova, T. V.; Popova, L. Y.; Pechurkin, N. S.

Luminescent transgenic microorganisms are widely used for study of microbial communities' functioning including closed ones. Bioluminescence is of high sensitive to effects of different environmental factors. Integration of lux-genes into different metabolic ways allows studying many aspects of microorganisms' life permitting to carry out measurements in situ. There is much information about applications of bioluminescent bacteria in different researches. But for effective using these data their summarizing and accumulation in common source is required. Therefore an information system on characteristics of transgenic microorganisms with cloned lux-genes was created. The database and client software related were developed. A database structure includes information on common characteristics of cloned lux-genes, their sources and properties, on regulation of gene expression in bacterial cells, on dependence of bioluminescence manifestation on biotic, abiotic and anthropogenic environmental factors. The database also can store description of changes in bacterial populations depending on environmental changes. The database created allows storing and using bibliographic information and also links to web sites of world collections of microorganisms. Internet publishing software permitting to open access to the database through the Internet is developed.
Pan European Phenological database (PEP725): a single point of access for European data

NASA Astrophysics Data System (ADS)

Templ, Barbara; Koch, Elisabeth; Bolmgren, Kjell; Ungersböck, Markus; Paul, Anita; Scheifinger, Helfried; Rutishauser, This; Busto, Montserrat; Chmielewski, Frank-M.; Hájková, Lenka; Hodzić, Sabina; Kaspar, Frank; Pietragalla, Barbara; Romero-Fresneda, Ramiro; Tolvanen, Anne; Vučetič, Višnja; Zimmermann, Kirsten; Zust, Ana

2018-06-01

The Pan European Phenology (PEP) project is a European infrastructure to promote and facilitate phenological research, education, and environmental monitoring. The main objective is to maintain and develop a Pan European Phenological database (PEP725) with an open, unrestricted data access for science and education. PEP725 is the successor of the database developed through the COST action 725 "Establishing a European phenological data platform for climatological applications" working as a single access point for European-wide plant phenological data. So far, 32 European meteorological services and project partners from across Europe have joined and supplied data collected by volunteers from 1868 to the present for the PEP725 database. Most of the partners actively provide data on a regular basis. The database presently holds almost 12 million records, about 46 growing stages and 265 plant species (including cultivars), and can be accessed via http://www.pep725.eu/ . Users of the PEP725 database have studied a diversity of topics ranging from climate change impact, plant physiological question, phenological modeling, and remote sensing of vegetation to ecosystem productivity.

New additions to the cancer precision medicine toolkit.

PubMed

Mardis, Elaine R

2018-04-13

New computational and database-driven tools are emerging to aid in the interpretation of cancer genomic data as its use becomes more common in clinical evidence-based cancer medicine. Two such open source tools, published recently in Genome Medicine, provide important advances to address the clinical cancer genomics data interpretation bottleneck.
Cellular Consequences of Telomere Shortening in Histologically Normal Breast Tissues

DTIC Science & Technology

2013-09-01

using the open source, JAVA -based image analysis software package ImageJ (http://rsb.info.nih.gov/ij/) and a custom designed plugin (“Telometer...Tabulated data were stored in a MySQL (http://www.mysql.com) database and viewed through Microsoft Access (Microsoft Corp.). Statistical Analysis For
JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles

PubMed Central

Portales-Casamar, Elodie; Thongjuea, Supat; Kwon, Andrew T.; Arenillas, David; Zhao, Xiaobei; Valen, Eivind; Yusuf, Dimas; Lenhard, Boris; Wasserman, Wyeth W.; Sandelin, Albin

2010-01-01

JASPAR (http://jaspar.genereg.net) is the leading open-access database of matrix profiles describing the DNA-binding patterns of transcription factors (TFs) and other proteins interacting with DNA in a sequence-specific manner. Its fourth major release is the largest expansion of the core database to date: the database now holds 457 non-redundant, curated profiles. The new entries include the first batch of profiles derived from ChIP-seq and ChIP-chip whole-genome binding experiments, and 177 yeast TF binding profiles. The introduction of a yeast division brings the convenience of JASPAR to an active research community. As binding models are refined by newer data, the JASPAR database now uses versioning of matrices: in this release, 12% of the older models were updated to improved versions. Classification of TF families has been improved by adopting a new DNA-binding domain nomenclature. A curated catalog of mammalian TFs is provided, extending the use of the JASPAR profiles to additional TFs belonging to the same structural family. The changes in the database set the system ready for more rapid acquisition of new high-throughput data sources. Additionally, three new special collections provide matrix profile data produced by recent alternative high-throughput approaches. PMID:19906716
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
An event database for rotational seismology

NASA Astrophysics Data System (ADS)

Salvermoser, Johannes; Hadziioannou, Celine; Hable, Sarah; Chow, Bryant; Krischer, Lion; Wassermann, Joachim; Igel, Heiner

2016-04-01

The ring laser sensor (G-ring) located at Wettzell, Germany, routinely observes earthquake-induced rotational ground motions around a vertical axis since its installation in 2003. Here we present results from a recently installed event database which is the first that will provide ring laser event data in an open access format. Based on the GCMT event catalogue and some search criteria, seismograms from the ring laser and the collocated broadband seismometer are extracted and processed. The ObsPy-based processing scheme generates plots showing waveform fits between rotation rate and transverse acceleration and extracts characteristic wavefield parameters such as peak ground motions, noise levels, Love wave phase velocities and waveform coherence. For each event, these parameters are stored in a text file (json dictionary) which is easily readable and accessible on the website. The database contains >10000 events starting in 2007 (Mw>4.5). It is updated daily and therefore provides recent events at a time lag of max. 24 hours. The user interface allows to filter events for epoch, magnitude, and source area, whereupon the events are displayed on a zoomable world map. We investigate how well the rotational motions are compatible with the expectations from the surface wave magnitude scale. In addition, the website offers some python source code examples for downloading and processing the openly accessible waveforms.
Exploring Chemical Space for Drug Discovery Using the Chemical Universe Database

PubMed Central

2012-01-01

Herein we review our recent efforts in searching for bioactive ligands by enumeration and virtual screening of the unknown chemical space of small molecules. Enumeration from first principles shows that almost all small molecules (>99.9%) have never been synthesized and are still available to be prepared and tested. We discuss open access sources of molecules, the classification and representation of chemical space using molecular quantum numbers (MQN), its exhaustive enumeration in form of the chemical universe generated databases (GDB), and examples of using these databases for prospective drug discovery. MQN-searchable GDB, PubChem, and DrugBank are freely accessible at www.gdb.unibe.ch. PMID:23019491
GEOMAGIA50: An archeointensity database with PHP and MySQL

NASA Astrophysics Data System (ADS)

Korhonen, K.; Donadini, F.; Riisager, P.; Pesonen, L. J.

2008-04-01

The GEOMAGIA50 database stores 3798 archeomagnetic and paleomagnetic intensity determinations dated to the past 50,000 years. It also stores details of the measurement setup for each determination, which are used for ranking the data according to prescribed reliability criteria. The ranking system aims to alleviate the data reliability problem inherent in this kind of data. GEOMAGIA50 is based on two popular open source technologies. The MySQL database management system is used for storing the data, whereas the functionality and user interface are provided by server-side PHP scripts. This technical brief gives a detailed description of GEOMAGIA50 from a technical viewpoint.
A Tony Thomas-Inspired Guide to INSPIRE

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Connell, Heath B.; /Fermilab

2010-04-01

The SPIRES database was created in the late 1960s to catalogue the high energy physics preprints received by the SLAC Library. In the early 1990s it became the first database on the web and the first website outside of Europe. Although indispensible to the HEP community, its aging software infrastructure is becoming a serious liability. In a joint project involving CERN, DESY, Fermilab and SLAC, a new database, INSPIRE, is being created to replace SPIRES using CERN's modern, open-source Invenio database software. INSPIRE will maintain the content and functionality of SPIRES plus many new features. I describe this evolution frommore » the birth of SPIRES to the current day, noting that the career of Tony Thomas spans this timeline.« less
Global building inventory for earthquake loss estimation and risk management

USGS Publications Warehouse

Jaiswal, Kishor; Wald, David; Porter, Keith

2010-01-01

We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey’s Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat’s demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature.
New Zealand's National Landslide Database

NASA Astrophysics Data System (ADS)

Rosser, B.; Dellow, S.; Haubrook, S.; Glassey, P.

2016-12-01

Since 1780, landslides have caused an average of about 3 deaths a year in New Zealand and have cost the economy an average of at least NZ$250M/a (0.1% GDP). To understand the risk posed by landslide hazards to society, a thorough knowledge of where, when and why different types of landslides occur is vital. The main objective for establishing the database was to provide a centralised national-scale, publically available database to collate landslide information that could be used for landslide hazard and risk assessment. Design of a national landslide database for New Zealand required consideration of both existing landslide data stored in a variety of digital formats, and future data, yet to be collected. Pre-existing databases were developed and populated with data reflecting the needs of the landslide or hazard project, and the database structures of the time. Bringing these data into a single unified database required a new structure capable of storing and delivering data at a variety of scales and accuracy and with different attributes. A "unified data model" was developed to enable the database to hold old and new landslide data irrespective of scale and method of capture. The database contains information on landslide locations and where available: 1) the timing of landslides and the events that may have triggered them; 2) the type of landslide movement; 3) the volume and area; 4) the source and debris tail; and 5) the impacts caused by the landslide. Information from a variety of sources including aerial photographs (and other remotely sensed data), field reconnaissance and media accounts has been collated and is presented for each landslide along with metadata describing the data sources and quality. There are currently nearly 19,000 landslide records in the database that include point locations, polygons of landslide source and deposit areas, and linear features. Several large datasets are awaiting upload which will bring the total number of landslides to over 100,000. The geo-spatial database is publicly available via the Internet. Software components, including the underlying database (PostGIS), Web Map Server (GeoServer) and web application use open-source software. The hope is that others will add relevant information to the database as well as download the data contained in it.
Computational toxicology using the OpenTox application programming interface and Bioclipse

PubMed Central

2011-01-01

Background Toxicity is a complex phenomenon involving the potential adverse effect on a range of biological functions. Predicting toxicity involves using a combination of experimental data (endpoints) and computational methods to generate a set of predictive models. Such models rely strongly on being able to integrate information from many sources. The required integration of biological and chemical information sources requires, however, a common language to express our knowledge ontologically, and interoperating services to build reliable predictive toxicology applications. Findings This article describes progress in extending the integrative bio- and cheminformatics platform Bioclipse to interoperate with OpenTox, a semantic web framework which supports open data exchange and toxicology model building. The Bioclipse workbench environment enables functionality from OpenTox web services and easy access to OpenTox resources for evaluating toxicity properties of query molecules. Relevant cases and interfaces based on ten neurotoxins are described to demonstrate the capabilities provided to the user. The integration takes advantage of semantic web technologies, thereby providing an open and simplifying communication standard. Additionally, the use of ontologies ensures proper interoperation and reliable integration of toxicity information from both experimental and computational sources. Conclusions A novel computational toxicity assessment platform was generated from integration of two open science platforms related to toxicology: Bioclipse, that combines a rich scriptable and graphical workbench environment for integration of diverse sets of information sources, and OpenTox, a platform for interoperable toxicology data and computational services. The combination provides improved reliability and operability for handling large data sets by the use of the Open Standards from the OpenTox Application Programming Interface. This enables simultaneous access to a variety of distributed predictive toxicology databases, and algorithm and model resources, taking advantage of the Bioclipse workbench handling the technical layers. PMID:22075173
Spatial Dmbs Architecture for a Free and Open Source Bim

NASA Astrophysics Data System (ADS)

Logothetis, S.; Valari, E.; Karachaliou, E.; Stylianidis, E.

2017-08-01

Recent research on the field of Building Information Modelling (BIM) technology, revealed that except of a few, accessible and free BIM viewers there is a lack of Free & Open Source Software (FOSS) BIM software for the complete BIM process. With this in mind and considering BIM as the technological advancement of Computer-Aided Design (CAD) systems, the current work proposes the use of a FOSS CAD software in order to extend its capabilities and transform it gradually into a FOSS BIM platform. Towards this undertaking, a first approach on developing a spatial Database Management System (DBMS) able to store, organize and manage the overall amount of information within a single application, is presented.
Earthquake forecasting studies using radon time series data in Taiwan

NASA Astrophysics Data System (ADS)

Walia, Vivek; Kumar, Arvind; Fu, Ching-Chou; Lin, Shih-Jung; Chou, Kuang-Wu; Wen, Kuo-Liang; Chen, Cheng-Hong

2017-04-01

For few decades, growing number of studies have shown usefulness of data in the field of seismogeochemistry interpreted as geochemical precursory signals for impending earthquakes and radon is idendified to be as one of the most reliable geochemical precursor. Radon is recognized as short-term precursor and is being monitored in many countries. This study is aimed at developing an effective earthquake forecasting system by inspecting long term radon time series data. The data is obtained from a network of radon monitoring stations eastblished along different faults of Taiwan. The continuous time series radon data for earthquake studies have been recorded and some significant variations associated with strong earthquakes have been observed. The data is also examined to evaluate earthquake precursory signals against environmental factors. An automated real-time database operating system has been developed recently to improve the data processing for earthquake precursory studies. In addition, the study is aimed at the appraisal and filtrations of these environmental parameters, in order to create a real-time database that helps our earthquake precursory study. In recent years, automatic operating real-time database has been developed using R, an open source programming language, to carry out statistical computation on the data. To integrate our data with our working procedure, we use the popular and famous open source web application solution, AMP (Apache, MySQL, and PHP), creating a website that could effectively show and help us manage the real-time database.
OpenElectrophy: An Electrophysiological Data- and Analysis-Sharing Framework

PubMed Central

Garcia, Samuel; Fourcaud-Trocmé, Nicolas

2008-01-01

Progress in experimental tools and design is allowing the acquisition of increasingly large datasets. Storage, manipulation and efficient analyses of such large amounts of data is now a primary issue. We present OpenElectrophy, an electrophysiological data- and analysis-sharing framework developed to fill this niche. It stores all experiment data and meta-data in a single central MySQL database, and provides a graphic user interface to visualize and explore the data, and a library of functions for user analysis scripting in Python. It implements multiple spike-sorting methods, and oscillation detection based on the ridge extraction methods due to Roux et al. (2007). OpenElectrophy is open source and is freely available for download at http://neuralensemble.org/trac/OpenElectrophy. PMID:19521545
Earth science big data at users' fingertips: the EarthServer Science Gateway Mobile

NASA Astrophysics Data System (ADS)

Barbera, Roberto; Bruno, Riccardo; Calanducci, Antonio; Fargetta, Marco; Pappalardo, Marco; Rundo, Francesco

2014-05-01

The EarthServer project (www.earthserver.eu), funded by the European Commission under its Seventh Framework Program, aims at establishing open access and ad-hoc analytics on extreme-size Earth Science data, based on and extending leading-edge Array Database technology. The core idea is to use database query languages as client/server interface to achieve barrier-free "mix & match" access to multi-source, any-size, multi-dimensional space-time data -- in short: "Big Earth Data Analytics" - based on the open standards of the Open Geospatial Consortium Web Coverage Processing Service (OGC WCPS) and the W3C XQuery. EarthServer combines both, thereby achieving a tight data/metadata integration. Further, the rasdaman Array Database System (www.rasdaman.com) is extended with further space-time coverage data types. On server side, highly effective optimizations - such as parallel and distributed query processing - ensure scalability to Exabyte volumes. In this contribution we will report on the EarthServer Science Gateway Mobile, an app for both iOS and Android-based devices that allows users to seamlessly access some of the EarthServer applications using SAML-based federated authentication and fine-grained authorisation mechanisms.
WEB-GIS Decision Support System for CO2 storage

NASA Astrophysics Data System (ADS)

Gaitanaru, Dragos; Leonard, Anghel; Radu Gogu, Constantin; Le Guen, Yvi; Scradeanu, Daniel; Pagnejer, Mihaela

2013-04-01

Environmental decision support systems (DSS) paradigm evolves and changes as more knowledge and technology become available to the environmental community. Geographic Information Systems (GIS) can be used to extract, assess and disseminate some types of information, which are otherwise difficult to access by traditional methods. In the same time, with the help of the Internet and accompanying tools, creating and publishing online interactive maps has become easier and rich with options. The Decision Support System (MDSS) developed for the MUSTANG (A MUltiple Space and Time scale Approach for the quaNtification of deep saline formations for CO2 storaGe) project is a user friendly web based application that uses the GIS capabilities. MDSS can be exploited by the experts for CO2 injection and storage in deep saline aquifers. The main objective of the MDSS is to help the experts to take decisions based large structured types of data and information. In order to achieve this objective the MDSS has a geospatial objected-orientated database structure for a wide variety of data and information. The entire application is based on several principles leading to a series of capabilities and specific characteristics: (i) Open-Source - the entire platform (MDSS) is based on open-source technologies - (1) database engine, (2) application server, (3) geospatial server, (4) user interfaces, (5) add-ons, etc. (ii) Multiple database connections - MDSS is capable to connect to different databases that are located on different server machines. (iii)Desktop user experience - MDSS architecture and design follows the structure of a desktop software. (iv)Communication - the server side and the desktop are bound together by series functions that allows the user to upload, use, modify and download data within the application. The architecture of the system involves one database and a modular application composed by: (1) a visualization module, (2) an analysis module, (3) a guidelines module, and (4) a risk assessment module. The Database component is build by using the PostgreSQL and PostGIS open source technology. The visualization module allows the user to view data of CO2 injection sites in different ways: (1) geospatial visualization, (2) table view, (3) 3D visualization. The analysis module will allow the user to perform certain analysis like Injectivity, Containment and Capacity analysis. The Risk Assessment module focus on the site risk matrix approach. The Guidelines module contains the methodologies of CO2 injection and storage into deep saline aquifers guidelines.
The eNanoMapper database for nanomaterial safety information

PubMed Central

Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

2015-01-01

Summary Background: The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. Results: The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. Conclusion: We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the “representational state transfer” (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure–activity relationships for nanomaterials (NanoQSAR). PMID:26425413
The eNanoMapper database for nanomaterial safety information.

PubMed

Jeliazkova, Nina; Chomenidis, Charalampos; Doganis, Philip; Fadeel, Bengt; Grafström, Roland; Hardy, Barry; Hastings, Janna; Hegi, Markus; Jeliazkov, Vedrin; Kochev, Nikolay; Kohonen, Pekka; Munteanu, Cristian R; Sarimveis, Haralambos; Smeets, Bart; Sopasakis, Pantelis; Tsiliki, Georgia; Vorgrimmler, David; Willighagen, Egon

2015-01-01

The NanoSafety Cluster, a cluster of projects funded by the European Commision, identified the need for a computational infrastructure for toxicological data management of engineered nanomaterials (ENMs). Ontologies, open standards, and interoperable designs were envisioned to empower a harmonized approach to European research in nanotechnology. This setting provides a number of opportunities and challenges in the representation of nanomaterials data and the integration of ENM information originating from diverse systems. Within this cluster, eNanoMapper works towards supporting the collaborative safety assessment for ENMs by creating a modular and extensible infrastructure for data sharing, data analysis, and building computational toxicology models for ENMs. The eNanoMapper database solution builds on the previous experience of the consortium partners in supporting diverse data through flexible data storage, open source components and web services. We have recently described the design of the eNanoMapper prototype database along with a summary of challenges in the representation of ENM data and an extensive review of existing nano-related data models, databases, and nanomaterials-related entries in chemical and toxicogenomic databases. This paper continues with a focus on the database functionality exposed through its application programming interface (API), and its use in visualisation and modelling. Considering the preferred community practice of using spreadsheet templates, we developed a configurable spreadsheet parser facilitating user friendly data preparation and data upload. We further present a web application able to retrieve the experimental data via the API and analyze it with multiple data preprocessing and machine learning algorithms. We demonstrate how the eNanoMapper database is used to import and publish online ENM and assay data from several data sources, how the "representational state transfer" (REST) API enables building user friendly interfaces and graphical summaries of the data, and how these resources facilitate the modelling of reproducible quantitative structure-activity relationships for nanomaterials (NanoQSAR).
VarioML framework for comprehensive variation data representation and exchange.

PubMed

Byrne, Myles; Fokkema, Ivo Fac; Lancaster, Owen; Adamusiak, Tomasz; Ahonen-Bishopp, Anni; Atlan, David; Béroud, Christophe; Cornell, Michael; Dalgleish, Raymond; Devereau, Andrew; Patrinos, George P; Swertz, Morris A; Taschner, Peter Em; Thorisson, Gudmundur A; Vihinen, Mauno; Brookes, Anthony J; Muilu, Juha

2012-10-03

Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
VarioML framework for comprehensive variation data representation and exchange

PubMed Central

2012-01-01

Background Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity. PMID:23031277

Teaching Undergraduate Software Engineering Using Open Source Development Tools

DTIC Science & Technology

2012-01-01

ware. Some example appliances are: a LAMP stack, Redmine, MySQL database, Moodle, Tom- cat on Apache, and Bugzilla. Some of the important features...Ada, C, C++, PHP , Py- thon, etc., and also supports a wide range of SDKs such as Google’s Android SDK and the Google Web Toolkit SDK. Additionally
Free/Libre open source software in health care: a review.

PubMed

Karopka, Thomas; Schmuhl, Holger; Demski, Hans

2014-01-01

To assess the current state of the art and the contribution of Free/Libre Open Source Software in health care (FLOSS-HC). The review is based on a narrative review of the scientific literature as well as sources in the context of FLOSS-HC available through the Internet. All relevant available sources have been integrated into the MedFLOSS database and are freely available to the community. The literature review reveals that publications about FLOSS-HC are scarce. The largest part of information about FLOSS-HC is available on dedicated websites and not in the academic literature. There are currently FLOSS alternatives available for nearly every specialty in health care. Maturity and quality varies considerably and there is little information available on the percentage of systems that are actually used in health care delivery. The global impact of FLOSS-HC is still very limited and no figures on the penetration and usage of FLOSS-HC are available. However, there has been a considerable growth in the last 5 to 10 years. While there where only few systems available a decade ago, in the meantime many systems got available (e.g., more than 300 in the MedFLOSS database). While FLOSS concepts play an important role in most IT related sectors (e.g., telecommunications, embedded devices) the healthcare industry is lagging behind this trend.
Free/Libre Open Source Software in Health Care: A Review

PubMed Central

Schmuhl, Holger; Demski, Hans

2014-01-01

Objectives To assess the current state of the art and the contribution of Free/Libre Open Source Software in health care (FLOSS-HC). Methods The review is based on a narrative review of the scientific literature as well as sources in the context of FLOSS-HC available through the Internet. All relevant available sources have been integrated into the MedFLOSS database and are freely available to the community. Results The literature review reveals that publications about FLOSS-HC are scarce. The largest part of information about FLOSS-HC is available on dedicated websites and not in the academic literature. There are currently FLOSS alternatives available for nearly every specialty in health care. Maturity and quality varies considerably and there is little information available on the percentage of systems that are actually used in health care delivery. Conclusions The global impact of FLOSS-HC is still very limited and no figures on the penetration and usage of FLOSS-HC are available. However, there has been a considerable growth in the last 5 to 10 years. While there where only few systems available a decade ago, in the meantime many systems got available (e.g., more than 300 in the MedFLOSS database). While FLOSS concepts play an important role in most IT related sectors (e.g., telecommunications, embedded devices) the healthcare industry is lagging behind this trend. PMID:24627814
2MASS Catalog Server Kit Version 2.1

NASA Astrophysics Data System (ADS)

Yamauchi, C.

2013-10-01

The 2MASS Catalog Server Kit is open source software for use in easily constructing a high performance search server for important astronomical catalogs. This software utilizes the open source RDBMS PostgreSQL, therefore, any users can setup the database on their local computers by following step-by-step installation guide. The kit provides highly optimized stored functions for positional searchs similar to SDSS SkyServer. Together with these, the powerful SQL environment of PostgreSQL will meet various user's demands. We released 2MASS Catalog Server Kit version 2.1 in 2012 May, which supports the latest WISE All-Sky catalog (563,921,584 rows) and 9 major all-sky catalogs. Local databases are often indispensable for observatories with unstable or narrow-band networks or severe use, such as retrieving large numbers of records within a small period of time. This software is the best for such purposes, and increasing supported catalogs and improvements of version 2.1 can cover a wider range of applications including advanced calibration system, scientific studies using complicated SQL queries, etc. Official page: http://www.ir.isas.jaxa.jp/~cyamauch/2masskit/
Integrating personalized medical test contents with XML and XSL-FO.

PubMed

Toddenroth, Dennis; Dugas, Martin; Frankewitsch, Thomas

2011-03-01

In 2004 the adoption of a modular curriculum at the medical faculty in Muenster led to the introduction of centralized examinations based on multiple-choice questions (MCQs). We report on how organizational challenges of realizing faculty-wide personalized tests were addressed by implementation of a specialized software module to automatically generate test sheets from individual test registrations and MCQ contents. Key steps of the presented method for preparing personalized test sheets are (1) the compilation of relevant item contents and graphical media from a relational database with database queries, (2) the creation of Extensible Markup Language (XML) intermediates, and (3) the transformation into paginated documents. The software module by use of an open source print formatter consistently produced high-quality test sheets, while the blending of vectorized textual contents and pixel graphics resulted in efficient output file sizes. Concomitantly the module permitted an individual randomization of item sequences to prevent illicit collusion. The automatic generation of personalized MCQ test sheets is feasible using freely available open source software libraries, and can be efficiently deployed on a faculty-wide scale.
NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources.

PubMed

Ningthoujam, Sanjoy Singh; Choudhury, Manabendra Dutta; Potsangbam, Kumar Singh; Chetia, Pankaj; Nahar, Lutfun; Sarker, Satyajit D; Basar, Norazah; Das Talukdar, Anupam

2014-01-01

Sharing traditional knowledge with the scientific community could refine scientific approaches to phytochemical investigation and conservation of ethnomedicinal plants. As such, integration of traditional knowledge with scientific data using a single platform for sharing is greatly needed. However, ethnomedicinal data are available in heterogeneous formats, which depend on cultural aspects, survey methodology and focus of the study. Phytochemical and bioassay data are also available from many open sources in various standards and customised formats. To design a flexible data model that could integrate both primary and curated ethnomedicinal plant data from multiple sources. The current model is based on MongoDB, one of the Not only Structured Query Language (NoSQL) databases. Although it does not contain schema, modifications were made so that the model could incorporate both standard and customised ethnomedicinal plant data format from different sources. The model presented can integrate both primary and secondary data related to ethnomedicinal plants. Accommodation of disparate data was accomplished by a feature of this database that supported a different set of fields for each document. It also allowed storage of similar data having different properties. The model presented is scalable to a highly complex level with continuing maturation of the database, and is applicable for storing, retrieving and sharing ethnomedicinal plant data. It can also serve as a flexible alternative to a relational and normalised database. Copyright © 2014 John Wiley & Sons, Ltd.
WImpiBLAST: web interface for mpiBLAST to help biologists perform large-scale annotation using high performance computing.

PubMed

Sharma, Parichit; Mantri, Shrikant S

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis.
WImpiBLAST: Web Interface for mpiBLAST to Help Biologists Perform Large-Scale Annotation Using High Performance Computing

PubMed Central

Sharma, Parichit; Mantri, Shrikant S.

2014-01-01

The function of a newly sequenced gene can be discovered by determining its sequence homology with known proteins. BLAST is the most extensively used sequence analysis program for sequence similarity search in large databases of sequences. With the advent of next generation sequencing technologies it has now become possible to study genes and their expression at a genome-wide scale through RNA-seq and metagenome sequencing experiments. Functional annotation of all the genes is done by sequence similarity search against multiple protein databases. This annotation task is computationally very intensive and can take days to obtain complete results. The program mpiBLAST, an open-source parallelization of BLAST that achieves superlinear speedup, can be used to accelerate large-scale annotation by using supercomputers and high performance computing (HPC) clusters. Although many parallel bioinformatics applications using the Message Passing Interface (MPI) are available in the public domain, researchers are reluctant to use them due to lack of expertise in the Linux command line and relevant programming experience. With these limitations, it becomes difficult for biologists to use mpiBLAST for accelerating annotation. No web interface is available in the open-source domain for mpiBLAST. We have developed WImpiBLAST, a user-friendly open-source web interface for parallel BLAST searches. It is implemented in Struts 1.3 using a Java backbone and runs atop the open-source Apache Tomcat Server. WImpiBLAST supports script creation and job submission features and also provides a robust job management interface for system administrators. It combines script creation and modification features with job monitoring and management through the Torque resource manager on a Linux-based HPC cluster. Use case information highlights the acceleration of annotation analysis achieved by using WImpiBLAST. Here, we describe the WImpiBLAST web interface features and architecture, explain design decisions, describe workflows and provide a detailed analysis. PMID:24979410
Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse.

PubMed

Soranno, Patricia A; Bissell, Edward G; Cheruvelil, Kendra S; Christel, Samuel T; Collins, Sarah M; Fergus, C Emi; Filstrup, Christopher T; Lapierre, Jean-Francois; Lottig, Noah R; Oliver, Samantha K; Scott, Caren E; Smith, Nicole J; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A; Gries, Corinna; Henry, Emily N; Skaff, Nick K; Stanley, Emily H; Stow, Craig A; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E

2015-01-01

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
Building a multi-scaled geospatial temporal ecology database from disparate data sources: Fostering open science through data reuse

USGS Publications Warehouse

Soranno, Patricia A.; Bissell, E.G.; Cheruvelil, Kendra S.; Christel, Samuel T.; Collins, Sarah M.; Fergus, C. Emi; Filstrup, Christopher T.; Lapierre, Jean-Francois; Lotting, Noah R.; Oliver, Samantha K.; Scott, Caren E.; Smith, Nicole J.; Stopyak, Scott; Yuan, Shuai; Bremigan, Mary Tate; Downing, John A.; Gries, Corinna; Henry, Emily N.; Skaff, Nick K.; Stanley, Emily H.; Stow, Craig A.; Tan, Pang-Ning; Wagner, Tyler; Webster, Katherine E.

2015-01-01

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km2). LAGOS includes two modules: LAGOSGEO, with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOSLIMNO, with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration.
ACToR: Aggregated Computational Toxicology Resource (T) ...

EPA Pesticide Factsheets

The EPA Aggregated Computational Toxicology Resource (ACToR) is a set of databases compiling information on chemicals in the environment from a large number of public and in-house EPA sources. ACToR has 3 main goals: (1) The serve as a repository of public toxicology information on chemicals of interest to the EPA, and in particular to be a central source for the testing data on all chemicals regulated by all EPA programs; (2) To be a source of in vivo training data sets for building in vitro to in vivo computational models; (3) To serve as a central source of chemical structure and identity information for the ToxCastTM and Tox21 programs. There are 4 main databases, all linked through a common set of chemical information and a common structure linking chemicals to assay data: the public ACToR system (available at http://actor.epa.gov), the ToxMiner database holding ToxCast and Tox21 data, along with results form statistical analyses on these data; the Tox21 chemical repository which is managing the ordering and sample tracking process for the larger Tox21 project; and the public version of ToxRefDB. The public ACToR system contains information on ~500K compounds with toxicology, exposure and chemical property information from >400 public sources. The web site is visited by ~1,000 unique users per month and generates ~1,000 page requests per day on average. The databases are built on open source technology, which has allowed us to export them to a number of col
What Can OpenEI Do For You?

DOE Office of Scientific and Technical Information (OSTI.GOV)

None

2010-12-10

Open Energy Information (OpenEI) is an open source web platform—similar to the one used by Wikipedia—developed by the US Department of Energy (DOE) and the National Renewable Energy Laboratory (NREL) to make the large amounts of energy-related data and information more easily searched, accessed, and used both by people and automated machine processes. Built utilizing the standards and practices of the Linked Open Data community, the OpenEI platform is much more robust and powerful than typical web sites and databases. As an open platform, all users can search, edit, add, and access data in OpenEI for free. The user communitymore » contributes the content and ensures its accuracy and relevance; as the community expands, so does the content's comprehensiveness and quality. The data are structured and tagged with descriptors to enable cross-linking among related data sets, advanced search functionality, and consistent, usable formatting. Data input protocols and quality standards help ensure the content is structured and described properly and derived from a credible source. Although DOE/NREL is developing OpenEI and seeding it with initial data, it is designed to become a true community model with millions of users, a large core of active contributors, and numerous sponsors.« less
What Can OpenEI Do For You?

ScienceCinema

None

2018-02-06

Open Energy Information (OpenEI) is an open source web platformâsimilar to the one used by Wikipediaâdeveloped by the US Department of Energy (DOE) and the National Renewable Energy Laboratory (NREL) to make the large amounts of energy-related data and information more easily searched, accessed, and used both by people and automated machine processes. Built utilizing the standards and practices of the Linked Open Data community, the OpenEI platform is much more robust and powerful than typical web sites and databases. As an open platform, all users can search, edit, add, and access data in OpenEI for free. The user community contributes the content and ensures its accuracy and relevance; as the community expands, so does the content's comprehensiveness and quality. The data are structured and tagged with descriptors to enable cross-linking among related data sets, advanced search functionality, and consistent, usable formatting. Data input protocols and quality standards help ensure the content is structured and described properly and derived from a credible source. Although DOE/NREL is developing OpenEI and seeding it with initial data, it is designed to become a true community model with millions of users, a large core of active contributors, and numerous sponsors.
LSST communications middleware implementation

NASA Astrophysics Data System (ADS)

Mills, Dave; Schumacher, German; Lotz, Paul

2016-07-01

The LSST communications middleware is based on a set of software abstractions; which provide standard interfaces for common communications services. The observatory requires communication between diverse subsystems, implemented by different contractors, and comprehensive archiving of subsystem status data. The Service Abstraction Layer (SAL) is implemented using open source packages that implement open standards of DDS (Data Distribution Service1) for data communication, and SQL (Standard Query Language) for database access. For every subsystem, abstractions for each of the Telemetry datastreams, along with Command/Response and Events, have been agreed with the appropriate component vendor (such as Dome, TMA, Hexapod), and captured in ICD's (Interface Control Documents).The OpenSplice (Prismtech) Community Edition of DDS provides an LGPL licensed distribution which may be freely redistributed. The availability of the full source code provides assurances that the project will be able to maintain it over the full 10 year survey, independent of the fortunes of the original providers.
Canto: an online tool for community literature curation.

PubMed

Rutherford, Kim M; Harris, Midori A; Lock, Antonia; Oliver, Stephen G; Wood, Valerie

2014-06-15

Detailed curation of published molecular data is essential for any model organism database. Community curation enables researchers to contribute data from their papers directly to databases, supplementing the activity of professional curators and improving coverage of a growing body of literature. We have developed Canto, a web-based tool that provides an intuitive curation interface for both curators and researchers, to support community curation in the fission yeast database, PomBase. Canto supports curation using OBO ontologies, and can be easily configured for use with any species. Canto code and documentation are available under an Open Source license from http://curation.pombase.org/. Canto is a component of the Generic Model Organism Database (GMOD) project (http://www.gmod.org/). © The Author 2014. Published by Oxford University Press.
A global building inventory for earthquake loss estimation and risk management

USGS Publications Warehouse

Jaiswal, K.; Wald, D.; Porter, K.

2010-01-01

We develop a global database of building inventories using taxonomy of global building types for use in near-real-time post-earthquake loss estimation and pre-earthquake risk analysis, for the U.S. Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER) program. The database is available for public use, subject to peer review, scrutiny, and open enhancement. On a country-by-country level, it contains estimates of the distribution of building types categorized by material, lateral force resisting system, and occupancy type (residential or nonresidential, urban or rural). The database draws on and harmonizes numerous sources: (1) UN statistics, (2) UN Habitat's demographic and health survey (DHS) database, (3) national housing censuses, (4) the World Housing Encyclopedia and (5) other literature. ?? 2010, Earthquake Engineering Research Institute.
Embracing the Open-Source Movement for the Management of Spatial Data: A Case Study of African Trypanosomiasis in Kenya

PubMed Central

Langley, Shaun A.; Messina, Joseph P.

2011-01-01

The past decade has seen an explosion in the availability of spatial data not only for researchers, but the public alike. As the quantity of data increases, the ability to effectively navigate and understand the data becomes more challenging. Here we detail a conceptual model for a spatially explicit database management system that addresses the issues raised with the growing data management problem. We demonstrate utility with a case study in disease ecology: to develop a multi-scale predictive model of African Trypanosomiasis in Kenya. International collaborations and varying technical expertise necessitate a modular open-source software solution. Finally, we address three recurring problems with data management: scalability, reliability, and security. PMID:21686072
Embracing the Open-Source Movement for the Management of Spatial Data: A Case Study of African Trypanosomiasis in Kenya.

PubMed

Langley, Shaun A; Messina, Joseph P

2011-01-01

The past decade has seen an explosion in the availability of spatial data not only for researchers, but the public alike. As the quantity of data increases, the ability to effectively navigate and understand the data becomes more challenging. Here we detail a conceptual model for a spatially explicit database management system that addresses the issues raised with the growing data management problem. We demonstrate utility with a case study in disease ecology: to develop a multi-scale predictive model of African Trypanosomiasis in Kenya. International collaborations and varying technical expertise necessitate a modular open-source software solution. Finally, we address three recurring problems with data management: scalability, reliability, and security.
BioWarehouse: a bioinformatics database warehouse toolkit

PubMed Central

Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David WJ; Tenenbaum, Jessica D; Karp, Peter D

2006-01-01

Background This article addresses the problem of interoperation of heterogeneous bioinformatics databases. Results We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. Conclusion BioWarehouse embodies significant progress on the database integration problem for bioinformatics. PMID:16556315
BioWarehouse: a bioinformatics database warehouse toolkit.

PubMed

Lee, Thomas J; Pouliot, Yannick; Wagner, Valerie; Gupta, Priyanka; Stringer-Calvert, David W J; Tenenbaum, Jessica D; Karp, Peter D

2006-03-23

This article addresses the problem of interoperation of heterogeneous bioinformatics databases. We introduce BioWarehouse, an open source toolkit for constructing bioinformatics database warehouses using the MySQL and Oracle relational database managers. BioWarehouse integrates its component databases into a common representational framework within a single database management system, thus enabling multi-database queries using the Structured Query Language (SQL) but also facilitating a variety of database integration tasks such as comparative analysis and data mining. BioWarehouse currently supports the integration of a pathway-centric set of databases including ENZYME, KEGG, and BioCyc, and in addition the UniProt, GenBank, NCBI Taxonomy, and CMR databases, and the Gene Ontology. Loader tools, written in the C and JAVA languages, parse and load these databases into a relational database schema. The loaders also apply a degree of semantic normalization to their respective source data, decreasing semantic heterogeneity. The schema supports the following bioinformatics datatypes: chemical compounds, biochemical reactions, metabolic pathways, proteins, genes, nucleic acid sequences, features on protein and nucleic-acid sequences, organisms, organism taxonomies, and controlled vocabularies. As an application example, we applied BioWarehouse to determine the fraction of biochemically characterized enzyme activities for which no sequences exist in the public sequence databases. The answer is that no sequence exists for 36% of enzyme activities for which EC numbers have been assigned. These gaps in sequence data significantly limit the accuracy of genome annotation and metabolic pathway prediction, and are a barrier for metabolic engineering. Complex queries of this type provide examples of the value of the data warehousing approach to bioinformatics research. BioWarehouse embodies significant progress on the database integration problem for bioinformatics.

Supporting Building Portfolio Investment and Policy Decision Making through an Integrated Building Utility Data Platform

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aziz, Azizan; Lasternas, Bertrand; Alschuler, Elena

The American Recovery and Reinvestment Act stimulus funding of 2009 for smart grid projects resulted in the tripling of smart meters deployment. In 2012, the Green Button initiative provided utility customers with access to their real-time1 energy usage. The availability of finely granular data provides an enormous potential for energy data analytics and energy benchmarking. The sheer volume of time-series utility data from a large number of buildings also poses challenges in data collection, quality control, and database management for rigorous and meaningful analyses. In this paper, we will describe a building portfolio-level data analytics tool for operational optimization, businessmore » investment and policy assessment using 15-minute to monthly intervals utility data. The analytics tool is developed on top of the U.S. Department of Energy’s Standard Energy Efficiency Data (SEED) platform, an open source software application that manages energy performance data of large groups of buildings. To support the significantly large volume of granular interval data, we integrated a parallel time-series database to the existing relational database. The time-series database improves on the current utility data input, focusing on real-time data collection, storage, analytics and data quality control. The fully integrated data platform supports APIs for utility apps development by third party software developers. These apps will provide actionable intelligence for building owners and facilities managers. Unlike a commercial system, this platform is an open source platform funded by the U.S. Government, accessible to the public, researchers and other developers, to support initiatives in reducing building energy consumption.« less
Construction of a nasopharyngeal carcinoma 2D/MS repository with Open Source XML database--Xindice.

PubMed

Li, Feng; Li, Maoyu; Xiao, Zhiqiang; Zhang, Pengfei; Li, Jianling; Chen, Zhuchu

2006-01-11

Many proteomics initiatives require integration of all information with uniformcriteria from collection of samples and data display to publication of experimental results. The integration and exchanging of these data of different formats and structure imposes a great challenge to us. The XML technology presents a promise in handling this task due to its simplicity and flexibility. Nasopharyngeal carcinoma (NPC) is one of the most common cancers in southern China and Southeast Asia, which has marked geographic and racial differences in incidence. Although there are some cancer proteome databases now, there is still no NPC proteome database. The raw NPC proteome experiment data were captured into one XML document with Human Proteome Markup Language (HUP-ML) editor and imported into native XML database Xindice. The 2D/MS repository of NPC proteome was constructed with Apache, PHP and Xindice to provide access to the database via Internet. On our website, two methods, keyword query and click query, were provided at the same time to access the entries of the NPC proteome database. Our 2D/MS repository can be used to share the raw NPC proteomics data that are generated from gel-based proteomics experiments. The database, as well as the PHP source codes for constructing users' own proteome repository, can be accessed at http://www.xyproteomics.org/.
The Future of ECHO: Evaluating Open Source Possibilities

NASA Astrophysics Data System (ADS)

Pilone, D.; Gilman, J.; Baynes, K.; Mitchell, A. E.

2012-12-01

NASA's Earth Observing System ClearingHOuse (ECHO) is a format agnostic metadata repository supporting over 3000 collections and 100M science granules. ECHO exposes FTP and RESTful Data Ingest APIs in addition to both SOAP and RESTful search and order capabilities. Built on top of ECHO is a human facing search and order web application named Reverb. ECHO processes hundreds of orders, tens of thousands of searches, and 1-2M ingest actions each week. As ECHO's holdings, metadata format support, and visibility have increased, the ECHO team has received requests by non-NASA entities for copies of ECHO that can be run locally against their data holdings. ESDIS and the ECHO Team have begun investigations into various deployment and Open Sourcing models that can balance the real constraints faced by the ECHO project with the benefits of providing ECHO capabilities to a broader set of users and providers. This talk will discuss several release and Open Source models being investigated by the ECHO team along with the impacts those models are expected to have on the project. We discuss: - Addressing complex deployment or setup issues for potential users - Models of vetting code contributions - Balancing external (public) user requests versus our primary partners - Preparing project code for public release, including navigating licensing issues related to leveraged libraries - Dealing with non-free project dependencies such as commercial databases - Dealing with sensitive aspects of project code such as database passwords, authentication approaches, security through obscurity, etc. - Ongoing support for the released code including increased testing demands, bug fixes, security fixes, and new features.
A Chado case study: an ontology-based modular schema for representing genome-associated biological information.

PubMed

Mungall, Christopher J; Emmert, David B

2007-07-01

A few years ago, FlyBase undertook to design a new database schema to store Drosophila data. It would fully integrate genomic sequence and annotation data with bibliographic, genetic, phenotypic and molecular data from the literature representing a distillation of the first 100 years of research on this major animal model system. In developing this new integrated schema, FlyBase also made a commitment to ensure that its design was generic, extensible and available as open source, so that it could be employed as the core schema of any model organism data repository, thereby avoiding redundant software development and potentially increasing interoperability. Our question was whether we could create a relational database schema that would be successfully reused. Chado is a relational database schema now being used to manage biological knowledge for a wide variety of organisms, from human to pathogens, especially the classes of information that directly or indirectly can be associated with genome sequences or the primary RNA and protein products encoded by a genome. Biological databases that conform to this schema can interoperate with one another, and with application software from the Generic Model Organism Database (GMOD) toolkit. Chado is distinctive because its design is driven by ontologies. The use of ontologies (or controlled vocabularies) is ubiquitous across the schema, as they are used as a means of typing entities. The Chado schema is partitioned into integrated subschemas (modules), each encapsulating a different biological domain, and each described using representations in appropriate ontologies. To illustrate this methodology, we describe here the Chado modules used for describing genomic sequences. GMOD is a collaboration of several model organism database groups, including FlyBase, to develop a set of open-source software for managing model organism data. The Chado schema is freely distributed under the terms of the Artistic License (http://www.opensource.org/licenses/artistic-license.php) from GMOD (www.gmod.org).
Making geospatial data in ASF archive readily accessible

NASA Astrophysics Data System (ADS)

Gens, R.; Hogenson, K.; Wolf, V. G.; Drew, L.; Stern, T.; Stoner, M.; Shapran, M.

2015-12-01

The way geospatial data is searched, managed, processed and used has changed significantly in recent years. A data archive such as the one at the Alaska Satellite Facility (ASF), one of NASA's twelve interlinked Distributed Active Archive Centers (DAACs), used to be searched solely via user interfaces that were specifically developed for its particular archive and data sets. ASF then moved to using an application programming interface (API) that defined a set of routines, protocols, and tools for distributing the geospatial information stored in the database in real time. This provided a more flexible access to the geospatial data. Yet, it was up to user to develop the tools to get a more tailored access to the data they needed. We present two new approaches for serving data to users. In response to the recent Nepal earthquake we developed a data feed for distributing ESA's Sentinel data. Users can subscribe to the data feed and are provided with the relevant metadata the moment a new data set is available for download. The second approach was an Open Geospatial Consortium (OGC) web feature service (WFS). The WFS hosts the metadata along with a direct link from which the data can be downloaded. It uses the open-source GeoServer software (Youngblood and Iacovella, 2013) and provides an interface to include the geospatial information in the archive directly into the user's geographic information system (GIS) as an additional data layer. Both services are run on top of a geospatial PostGIS database, an open-source geographic extension for the PostgreSQL object-relational database (Marquez, 2015). Marquez, A., 2015. PostGIS essentials. Packt Publishing, 198 p. Youngblood, B. and Iacovella, S., 2013. GeoServer Beginner's Guide, Packt Publishing, 350 p.
pvsR: An Open Source Interface to Big Data on the American Political Sphere.

PubMed

Matter, Ulrich; Stutzer, Alois

2015-01-01

Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS' data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS' new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases.
The EarthServer Federation: State, Role, and Contribution to GEOSS

NASA Astrophysics Data System (ADS)

Merticariu, Vlad; Baumann, Peter

2016-04-01

The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS. Datacube fusion in a "mix and match" style is supported by the platform technolgy, the rasdaman Array Database System, which transparently federates queries so that users simply approach any node of the federation to access any data item, internally optimized for minimal data transfer. Notably, rasdaman is part of GEOSS GCI. NASA is contributing its Web WorldWind virtual globe for user-friendly data extraction, navigation, and analysis. Integrated datacube / metadata queries are contributed by CITE. Current federation members include ESA (managed by MEEO sr.l.), Plymouth Marine Laboratory (PML), the European Centre for Medium-Range Weather Forecast (ECMWF), Australia's National Computational Infrastructure, and Jacobs University (adding in Planetary Science). Further data centers have expressed interest in joining. We present the EarthServer approach, discuss its underlying technology, and illustrate the contribution this datacube platform can make to GEOSS.
PubChem BioAssay: 2017 update

PubMed Central

Wang, Yanli; Bryant, Stephen H.; Cheng, Tiejun; Wang, Jiyao; Gindulyte, Asta; Shoemaker, Benjamin A.; Thiessen, Paul A.; He, Siqian; Zhang, Jian

2017-01-01

PubChem's BioAssay database (https://pubchem.ncbi.nlm.nih.gov) has served as a public repository for small-molecule and RNAi screening data since 2004 providing open access of its data content to the community. PubChem accepts data submission from worldwide researchers at academia, industry and government agencies. PubChem also collaborates with other chemical biology database stakeholders with data exchange. With over a decade's development effort, it becomes an important information resource supporting drug discovery and chemical biology research. To facilitate data discovery, PubChem is integrated with all other databases at NCBI. In this work, we provide an update for the PubChem BioAssay database describing several recent development including added sources of research data, redesigned BioAssay record page, new BioAssay classification browser and new features in the Upload system facilitating data sharing. PMID:27899599
PylotDB - A Database Management, Graphing, and Analysis Tool Written in Python

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barnette, Daniel W.

2012-01-04

PylotDB, written completely in Python, provides a user interface (UI) with which to interact with, analyze, graph data from, and manage open source databases such as MySQL. The UI mitigates the user having to know in-depth knowledge of the database application programming interface (API). PylotDB allows the user to generate various kinds of plots from user-selected data; generate statistical information on text as well as numerical fields; backup and restore databases; compare database tables across different databases as well as across different servers; extract information from any field to create new fields; generate, edit, and delete databases, tables, and fields;more » generate or read into a table CSV data; and similar operations. Since much of the database information is brought under control of the Python computer language, PylotDB is not intended for huge databases for which MySQL and Oracle, for example, are better suited. PylotDB is better suited for smaller databases that might be typically needed in a small research group situation. PylotDB can also be used as a learning tool for database applications in general.« less
Human grasping database for activities of daily living with depth, color and kinematic data streams.

PubMed

Saudabayev, Artur; Rysbek, Zhanibek; Khassenova, Raykhan; Varol, Huseyin Atakan

2018-05-29

This paper presents a grasping database collected from multiple human subjects for activities of daily living in unstructured environments. The main strength of this database is the use of three different sensing modalities: color images from a head-mounted action camera, distance data from a depth sensor on the dominant arm and upper body kinematic data acquired from an inertial motion capture suit. 3826 grasps were identified in the data collected during 9-hours of experiments. The grasps were grouped according to a hierarchical taxonomy into 35 different grasp types. The database contains information related to each grasp and associated sensor data acquired from the three sensor modalities. We also provide our data annotation software written in Matlab as an open-source tool. The size of the database is 172 GB. We believe this database can be used as a stepping stone to develop big data and machine learning techniques for grasping and manipulation with potential applications in rehabilitation robotics and intelligent automation.
Crystallography Open Database – an open-access collection of crystal structures

PubMed Central

Gražulis, Saulius; Chateigner, Daniel; Downs, Robert T.; Yokochi, A. F. T.; Quirós, Miguel; Lutterotti, Luca; Manakova, Elena; Butkus, Justas; Moeck, Peter; Le Bail, Armel

2009-01-01

The Crystallography Open Database (COD), which is a project that aims to gather all available inorganic, metal–organic and small organic molecule structural data in one database, is described. The database adopts an open-access model. The COD currently contains ∼80 000 entries in crystallographic information file format, with nearly full coverage of the International Union of Crystallography publications, and is growing in size and quality. PMID:22477773
An Open-source Meteorological Operational System and its Installation in Portuguese- speaking Countries

NASA Astrophysics Data System (ADS)

Almeida, W. G.; Ferreira, A. L.; Mendes, M. V.; Ribeiro, A.; Yoksas, T.

2007-05-01

CPTEC, a division of Brazil’s INPE, has been using several open-source software packages for a variety of tasks in its Data Division. Among these tools are ones traditionally used in research and educational communities such as GrADs (Grid Analysis and Display System from the Center for Ocean-Land-Atmosphere Studies (COLA)), the Local Data Manager (LDM) and GEMPAK (from Unidata), andl operational tools such the Automatic File Distributor (AFD) that are popular among National Meteorological Services. In addition, some tools developed locally at CPTEC are also being made available as open-source packages. One package is being used to manage the data from Automatic Weather Stations that INPE operates. This system uses only open- source tools such as MySQL database, PERL scripts and Java programs for web access, and Unidata’s Internet Data Distribution (IDD) system and AFD for data delivery. All of these packages are get bundled into a low-cost and easy to install and package called the Meteorological Data Operational System. Recently, in a cooperation with the SICLIMAD project, this system has been modified for use by Portuguese- speaking countries in Africa to manage data from many Automatic Weather Stations that are being installed in these countries under SICLIMAD sponsorship. In this presentation we describe the tools included-in and and architecture-of the Meteorological Data Operational System.
MARVIN: a medical research application framework based on open source software.

PubMed

Rudolph, Tobias; Puls, Marc; Anderegg, Christoph; Ebert, Lars; Broehan, Martina; Rudin, Adrian; Kowal, Jens

2008-08-01

This paper describes the open source framework MARVIN for rapid application development in the field of biomedical and clinical research. MARVIN applications consist of modules that can be plugged together in order to provide the functionality required for a specific experimental scenario. Application modules work on a common patient database that is used to store and organize medical data as well as derived data. MARVIN provides a flexible input/output system with support for many file formats including DICOM, various 2D image formats and surface mesh data. Furthermore, it implements an advanced visualization system and interfaces to a wide range of 3D tracking hardware. Since it uses only highly portable libraries, MARVIN applications run on Unix/Linux, Mac OS X and Microsoft Windows.
Digital geologic map and GIS database of Venezuela

USGS Publications Warehouse

Garrity, Christopher P.; Hackley, Paul C.; Urbani, Franco

2006-01-01

The digital geologic map and GIS database of Venezuela captures GIS compatible geologic and hydrologic data from the 'Geologic Shaded Relief Map of Venezuela,' which was released online as U.S. Geological Survey Open-File Report 2005-1038. Digital datasets and corresponding metadata files are stored in ESRI geodatabase format; accessible via ArcGIS 9.X. Feature classes in the geodatabase include geologic unit polygons, open water polygons, coincident geologic unit linework (contacts, faults, etc.) and non-coincident geologic unit linework (folds, drainage networks, etc.). Geologic unit polygon data were attributed for age, name, and lithologic type following the Lexico Estratigrafico de Venezuela. All digital datasets were captured from source data at 1:750,000. Although users may view and analyze data at varying scales, the authors make no guarantee as to the accuracy of the data at scales larger than 1:750,000.
New extension software modules to enhance searching and display of transcriptome data in Tripal databases

PubMed Central

Chen, Ming; Henry, Nathan; Almsaeed, Abdullah; Zhou, Xiao; Wegrzyn, Jill; Ficklin, Stephen

2017-01-01

Abstract Tripal is an open source software package for developing biological databases with a focus on genetic and genomic data. It consists of a set of core modules that deliver essential functions for loading and displaying data records and associated attributes including organisms, sequence features and genetic markers. Beyond the core modules, community members are encouraged to contribute extension modules to build on the Tripal core and to customize Tripal for individual community needs. To expand the utility of the Tripal software system, particularly for RNASeq data, we developed two new extension modules. Tripal Elasticsearch enables fast, scalable searching of the entire content of a Tripal site as well as the construction of customized advanced searches of specific data types. We demonstrate the use of this module for searching assembled transcripts by functional annotation. A second module, Tripal Analysis Expression, houses and displays records from gene expression assays such as RNA sequencing. This includes biological source materials (biomaterials), gene expression values and protocols used to generate the data. In the case of an RNASeq experiment, this would reflect the individual organisms and tissues used to produce sequencing libraries, the normalized gene expression values derived from the RNASeq data analysis and a description of the software or code used to generate the expression values. The module will load data from common flat file formats including standard NCBI Biosample XML. Data loading, display options and other configurations can be controlled by authorized users in the Drupal administrative backend. Both modules are open source, include usage documentation, and can be found in the Tripal organization’s GitHub repository. Database URL: Tripal Elasticsearch module: https://github.com/tripal/tripal_elasticsearch Tripal Analysis Expression module: https://github.com/tripal/tripal_analysis_expression PMID:29220446
MIRO and IRbase: IT Tools for the Epidemiological Monitoring of Insecticide Resistance in Mosquito Disease Vectors

PubMed Central

Dialynas, Emmanuel; Topalis, Pantelis; Vontas, John; Louis, Christos

2009-01-01

Background Monitoring of insect vector populations with respect to their susceptibility to one or more insecticides is a crucial element of the strategies used for the control of arthropod-borne diseases. This management task can nowadays be achieved more efficiently when assisted by IT (Information Technology) tools, ranging from modern integrated databases to GIS (Geographic Information System). Here we describe an application ontology that we developed de novo, and a specially designed database that, based on this ontology, can be used for the purpose of controlling mosquitoes and, thus, the diseases that they transmit. Methodology/Principal Findings The ontology, named MIRO for Mosquito Insecticide Resistance Ontology, developed using the OBO-Edit software, describes all pertinent aspects of insecticide resistance, including specific methodology and mode of action. MIRO, then, forms the basis for the design and development of a dedicated database, IRbase, constructed using open source software, which can be used to retrieve data on mosquito populations in a temporally and spatially separate way, as well as to map the output using a Google Earth interface. The dependency of the database on the MIRO allows for a rational and efficient hierarchical search possibility. Conclusions/Significance The fact that the MIRO complies with the rules set forward by the OBO (Open Biomedical Ontologies) Foundry introduces cross-referencing with other biomedical ontologies and, thus, both MIRO and IRbase are suitable as parts of future comprehensive surveillance tools and decision support systems that will be used for the control of vector-borne diseases. MIRO is downloadable from and IRbase is accessible at VectorBase, the NIAID-sponsored open access database for arthropod vectors of disease. PMID:19547750
Mobile service for open data visualization on geo-based images

NASA Astrophysics Data System (ADS)

Lee, Kiwon; Kim, Kwangseob; Kang, Sanggoo

2015-12-01

Since the early 2010s, governments in most countries have adopted and promoted open data policy and open data platform. Korea are in the same situation, and government and public organizations have operated the public-accessible open data portal systems since 2011. The number of open data and data type have been increasing every year. These trends are more expandable or extensible on mobile environments. The purpose of this study is to design and implement a mobile application service to visualize various typed or formatted public open data with geo-based images on the mobile web. Open data cover downloadable data sets or open-accessible data application programming interface API. Geo-based images mean multi-sensor satellite imageries which are referred in geo-coordinates and matched with digital map sets. System components for mobile service are fully based on open sources and open development environments without any commercialized tools: PostgreSQL for database management system, OTB for remote sensing image processing, GDAL for data conversion, GeoServer for application server, OpenLayers for mobile web mapping, R for data analysis and D3.js for web-based data graphic processing. Mobile application in client side was implemented by using HTML5 for cross browser and cross platform. The result shows many advantageous points such as linking open data and geo-based data, integrating open data and open source, and demonstrating mobile applications with open data. It is expected that this approach is cost effective and process efficient implementation strategy for intelligent earth observing data.
Numerical simulation of the SOFIA flowfield

NASA Technical Reports Server (NTRS)

Klotz, Stephen P.

1994-01-01

This report provides a concise summary of the contribution of computational fluid dynamics (CFD) to the SOFIA (Stratospheric Observatory for Infrared Astronomy) project at NASA Ames and presents results obtained from closed- and open-cavity SOFIA simulations. The aircraft platform is a Boeing 747SP and these are the first SOFIA simulations run with the aircraft empennage included in the geometry database. In the open-cavity run the telescope is mounted behind the wings. Results suggest that the cavity markedly influences the mean pressure distribution on empennage surfaces and that 110-140 decibel (db) sound pressure levels are typical in the cavity and on the horizontal and vertical stabilizers. A strong source of sound was found to exist on the rim of the open telescope cavity. The presence of this source suggests that additional design work needs to be performed in order to minimize the sound emanating from that location. A fluid dynamic analysis of the engine plumes is also contained in this report. The analysis was part of an effort to quantify the degradation of telescope performance resulting from the proximity of the port engine exhaust plumes to the open telescope bay.
Searching Across the International Space Station Databases

NASA Technical Reports Server (NTRS)

Maluf, David A.; McDermott, William J.; Smith, Ernest E.; Bell, David G.; Gurram, Mohana

2007-01-01

Data access in the enterprise generally requires us to combine data from different sources and different formats. It is advantageous thus to focus on the intersection of the knowledge across sources and domains; keeping irrelevant knowledge around only serves to make the integration more unwieldy and more complicated than necessary. A context search over multiple domain is proposed in this paper to use context sensitive queries to support disciplined manipulation of domain knowledge resources. The objective of a context search is to provide the capability for interrogating many domain knowledge resources, which are largely semantically disjoint. The search supports formally the tasks of selecting, combining, extending, specializing, and modifying components from a diverse set of domains. This paper demonstrates a new paradigm in composition of information for enterprise applications. In particular, it discusses an approach to achieving data integration across multiple sources, in a manner that does not require heavy investment in database and middleware maintenance. This lean approach to integration leads to cost-effectiveness and scalability of data integration with an underlying schemaless object-relational database management system. This highly scalable, information on demand system framework, called NX-Search, which is an implementation of an information system built on NETMARK. NETMARK is a flexible, high-throughput open database integration framework for managing, storing, and searching unstructured or semi-structured arbitrary XML and HTML used widely at the National Aeronautics Space Administration (NASA) and industry.
OOSTethys - Open Source Software for the Global Earth Observing Systems of Systems

NASA Astrophysics Data System (ADS)

Bridger, E.; Bermudez, L. E.; Maskey, M.; Rueda, C.; Babin, B. L.; Blair, R.

2009-12-01

An open source software project is much more than just picking the right license, hosting modular code and providing effective documentation. Success in advancing in an open collaborative way requires that the process match the expected code functionality to the developer's personal expertise and organizational needs as well as having an enthusiastic and responsive core lead group. We will present the lessons learned fromOOSTethys , which is a community of software developers and marine scientists who develop open source tools, in multiple languages, to integrate ocean observing systems into an Integrated Ocean Observing System (IOOS). OOSTethys' goal is to dramatically reduce the time it takes to install, adopt and update standards-compliant web services. OOSTethys has developed servers, clients and a registry. Open source PERL, PYTHON, JAVA and ASP tool kits and reference implementations are helping the marine community publish near real-time observation data in interoperable standard formats. In some cases publishing an OpenGeospatial Consortium (OGC), Sensor Observation Service (SOS) from NetCDF files or a database or even CSV text files could take only minutes depending on the skills of the developer. OOSTethys is also developing an OGC standard registry, Catalog Service for Web (CSW). This open source CSW registry was implemented to easily register and discover SOSs using ISO 19139 service metadata. A web interface layer over the CSW registry simplifies the registration process by harvesting metadata describing the observations and sensors from the “GetCapabilities” response of SOS. OPENIOOS is the web client, developed in PERL to visualize the sensors in the SOS services. While the number of OOSTethys software developers is small, currently about 10 around the world, the number of OOSTethys toolkit implementers is larger and growing and the ease of use has played a large role in spreading the use of interoperable standards compliant web services widely in the marine community.

Toxicological database of soil and derived products (BDT).

PubMed

Uricchio, Vito Felice

2008-01-01

The Toxicological database of soil and derived products is a project firstly proposed by the Regional Environmental Authority of Apulia. Such a project aims to provide comprehensive and updated information on the regional environmental characteristics, on the pollution state of the regional soil, on the main pollutants and on the reclaim techniques to be used in case of both non-point (agricultural activities) and point (industrial activities) sources of pollution. The project's focus is on the soil pollution because of the fundamental role played by the soil in supporting the biological cycle. Furthermore, the reasons for the project are related both to the reduction of human health risks due to toxic substances ingestion (these substances are present in some ring of the eating chain), and to the recognition of the importance of the groundwater quality safety (primary source of fresh water in many Mediterranean Regions). The essential requirements of a data entry are the following: speed and simplicity of the data entry; reliability and stability of the database structures; speed, easiness and pliability of the queries. Free consultation of the database represents one of the most remarkable advantages coming from the use of an "open" system.
Biopython: freely available Python tools for computational molecular biology and bioinformatics.

PubMed

Cock, Peter J A; Antao, Tiago; Chang, Jeffrey T; Chapman, Brad A; Cox, Cymon J; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J L

2009-06-01

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.
The new open Flexible Emission Inventory for Greece and the Greater Athens Area (FEI-GREGAA): Account of pollutant sources and their importance from 2006 to 2012

NASA Astrophysics Data System (ADS)

Fameli, Kyriaki-Maria; Assimakopoulos, Vasiliki D.

2016-07-01

Photochemical and particulate pollution problems persist in Athens as they do in various European cities, despite measures taken. Although, for many cities, organized and updated pollutant emissions databases exist, as well as infrastructure for the support of policy implementation, this is not the case for Greece and Athens. So far abstract efforts to create inventories from temporal and spatial annual low resolution data have not lead to the creation of a useful database. The objective of this study was to construct an emission inventory in order to examine the emission trends in Greece and the Greater Athens Area for the period 2006-2012 on a spatial scale of 6 × 6 km2 and 2 × 2 km2, respectively and on a temporal scale of 1 h. Emissions were calculated from stationary combustion sources, transportation (road, navigation and aviation), agriculture and industry obtained from official national and European sources. Moreover, new emission factors were calculated for road transport and aviation. The final database named F.E.I. - GREGAA (Flexible Emission Inventory for GREece and the GAA) is open-structured so as to receive data updates, new pollutants, various emission scenarios and/or different emission factors and be transformed for any grid spacing. Its main purpose is to be used in applications with photochemical models to contribute to the investigation on the type of sources and activities that lead to the configuration of air quality. Results showed a decreasing trend in CO, NOx and VOCs-NMVOCs emissions and an increasing trend from 2011 onwards in PM10 emissions. Road transport and small combustion contribute most to CO emissions, road transport and navigation to NOx and small combustion and industries to PM10. The onset of the economic crisis can be seen from the reduction of emissions from industry and the increase of biomass burning for heating purposes.
A Dashboard for the Italian Computing in ALICE

NASA Astrophysics Data System (ADS)

Elia, D.; Vino, G.; Bagnasco, S.; Crescente, A.; Donvito, G.; Franco, A.; Lusso, S.; Mura, D.; Piano, S.; Platania, G.; ALICE Collaboration

2017-10-01

A dashboard devoted to the computing in the Italian sites for the ALICE experiment at the LHC has been deployed. A combination of different complementary monitoring tools is typically used in most of the Tier-2 sites: this makes somewhat difficult to figure out at a glance the status of the site and to compare information extracted from different sources for debugging purposes. To overcome these limitations a dedicated ALICE dashboard has been designed and implemented in each of the ALICE Tier-2 sites in Italy: in particular, it provides a single, interactive and easily customizable graphical interface where heterogeneous data are presented. The dashboard is based on two main ingredients: an open source time-series database and a dashboard builder tool for visualizing time-series metrics. Various sensors, able to collect data from the multiple data sources, have been also written. A first version of a national computing dashboard has been implemented using a specific instance of the builder to gather data from all the local databases.
Open Clients for Distributed Databases

NASA Astrophysics Data System (ADS)

Chayes, D. N.; Arko, R. A.

2001-12-01

We are actively developing a collection of open source example clients that demonstrate use of our "back end" data management infrastructure. The data management system is reported elsewhere at this meeting (Arko and Chayes: A Scaleable Database Infrastructure). In addition to their primary goal of being examples for others to build upon, some of these clients may have limited utility in them selves. More information about the clients and the data infrastructure is available on line at http://data.ldeo.columbia.edu. The available examples to be demonstrated include several web-based clients including those developed for the Community Review System of the Digital Library for Earth System Education, a real-time watch standers log book, an offline interface to use log book entries, a simple client to search on multibeam metadata and others are Internet enabled and generally web-based front ends that support searches against one or more relational databases using industry standard SQL queries. In addition to the web based clients, simple SQL searches from within Excel and similar applications will be demonstrated. By defining, documenting and publishing a clear interface to the fully searchable databases, it becomes relatively easy to construct client interfaces that are optimized for specific applications in comparison to building a monolithic data and user interface system.
Online molecular image repository and analysis system: A multicenter collaborative open-source infrastructure for molecular imaging research and application.

PubMed

Rahman, Mahabubur; Watabe, Hiroshi

2018-05-01

Molecular imaging serves as an important tool for researchers and clinicians to visualize and investigate complex biochemical phenomena using specialized instruments; these instruments are either used individually or in combination with targeted imaging agents to obtain images related to specific diseases with high sensitivity, specificity, and signal-to-noise ratios. However, molecular imaging, which is a multidisciplinary research field, faces several challenges, including the integration of imaging informatics with bioinformatics and medical informatics, requirement of reliable and robust image analysis algorithms, effective quality control of imaging facilities, and those related to individualized disease mapping, data sharing, software architecture, and knowledge management. As a cost-effective and open-source approach to address these challenges related to molecular imaging, we develop a flexible, transparent, and secure infrastructure, named MIRA, which stands for Molecular Imaging Repository and Analysis, primarily using the Python programming language, and a MySQL relational database system deployed on a Linux server. MIRA is designed with a centralized image archiving infrastructure and information database so that a multicenter collaborative informatics platform can be built. The capability of dealing with metadata, image file format normalization, and storing and viewing different types of documents and multimedia files make MIRA considerably flexible. With features like logging, auditing, commenting, sharing, and searching, MIRA is useful as an Electronic Laboratory Notebook for effective knowledge management. In addition, the centralized approach for MIRA facilitates on-the-fly access to all its features remotely through any web browser. Furthermore, the open-source approach provides the opportunity for sustainable continued development. MIRA offers an infrastructure that can be used as cross-boundary collaborative MI research platform for the rapid achievement in cancer diagnosis and therapeutics. Copyright © 2018 Elsevier Ltd. All rights reserved.
SIG Contribution in the Making of Geotechnical Maps in Urban Areas

NASA Astrophysics Data System (ADS)

Monteiro, António; Pais, Luís Andrade; Rodrigues, Carlos; Carvalho, Paulo

2017-10-01

The use of Geographic Information Systems (GIS) has spread to several science areas, from oceanography to geotechnics. Its application in the urban mapping was intensified in the last century, which allowed a great development, due to the use of geographic database, new analysis tools and, more recently, free open source software. Geotechnical cartography struggle with a permanent and large environment re-organization in urban area, due to new building construction, trenching and the drilling of sampling wells and holes. This creates an extra important and largest volume of data at any pre-existence geological map. The main problem results on the fact that the natural environment is covered with buildings and communications system. The purpose of this work is to create a viable geographic information base for geotechnical mapping through a free GIS computer program and open source, with non-traditional cartographic sources, giving preference to open platforms. QGIS was used as software and “Google Maps”, “Bing Maps” and “OpenStreetMap” were applied as cartographic sources using the “OpenLayers plugin” module. Finally, we also pretend to identify and delimit the degree of granite’s change and fracturing areas using a “Streetview” platform. This model has cartographic input which are a geological map study area, open cartographic web archives and the use of “Streetview” platform. The output has several layouts, such as topography intersection (roads, borders, etc.), with geological map and the bordering area of Guarda Urban Zone. The use of this platform types decrease the collect data time and, sometimes, a careful observation of pictures that were taken during excavations may reveal important details for geological mapping in the study area.
Establishment of an international database for genetic variants in esophageal cancer.

PubMed

Vihinen, Mauno

2016-10-01

The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage. © 2016 New York Academy of Sciences.
Release of the gPhoton Database of GALEX Photon Events

NASA Astrophysics Data System (ADS)

Fleming, Scott W.; Million, Chase; Shiao, Bernie; Tucker, Michael; Loyd, R. O. Parke

2016-01-01

The GALEX spacecraft surveyed much of the sky in two ultraviolet bands between 2003 and 2013 with non-integrating microchannel plate detectors. The Mikulski Archive for Space Telescopes (MAST) has made more than one trillion photon events observed by the spacecraft available, stored as a 130 TB database, along with an open-source, python-based software package to query this database and create calibrated lightcurves or images from these data at user-defined spatial and temporal scales. In particular, MAST users can now conduct photometry at the intra-visit level (timescales of seconds and minutes). The software, along with the fully populated database, was officially released in Aug. 2015, and improvements to both software functionality and data calibration are ongoing. We summarize the current calibration status of the gPhoton software, along with examples of early science enabled by gPhoton that include stellar flares, AGN, white dwarfs, exoplanet hosts, novae, and nearby galaxies.
A scalable database model for multiparametric time series: a volcano observatory case study

NASA Astrophysics Data System (ADS)

Montalto, Placido; Aliotta, Marco; Cassisi, Carmelo; Prestifilippo, Michele; Cannata, Andrea

2014-05-01

The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.
A multidisciplinary database for geophysical time series management

NASA Astrophysics Data System (ADS)

Montalto, P.; Aliotta, M.; Cassisi, C.; Prestifilippo, M.; Cannata, A.

2013-12-01

The variables collected by a sensor network constitute a heterogeneous data source that needs to be properly organized in order to be used in research and geophysical monitoring. With the time series term we refer to a set of observations of a given phenomenon acquired sequentially in time. When the time intervals are equally spaced one speaks of period or sampling frequency. Our work describes in detail a possible methodology for storage and management of time series using a specific data structure. We designed a framework, hereinafter called TSDSystem (Time Series Database System), in order to acquire time series from different data sources and standardize them within a relational database. The operation of standardization provides the ability to perform operations, such as query and visualization, of many measures synchronizing them using a common time scale. The proposed architecture follows a multiple layer paradigm (Loaders layer, Database layer and Business Logic layer). Each layer is specialized in performing particular operations for the reorganization and archiving of data from different sources such as ASCII, Excel, ODBC (Open DataBase Connectivity), file accessible from the Internet (web pages, XML). In particular, the loader layer performs a security check of the working status of each running software through an heartbeat system, in order to automate the discovery of acquisition issues and other warning conditions. Although our system has to manage huge amounts of data, performance is guaranteed by using a smart partitioning table strategy, that keeps balanced the percentage of data stored in each database table. TSDSystem also contains modules for the visualization of acquired data, that provide the possibility to query different time series on a specified time range, or follow the realtime signal acquisition, according to a data access policy from the users.
Kinota: An Open-Source NoSQL implementation of OGC SensorThings for large-scale high-resolution real-time environmental monitoring

NASA Astrophysics Data System (ADS)

Miles, B.; Chepudira, K.; LaBar, W.

2017-12-01

The Open Geospatial Consortium (OGC) SensorThings API (STA) specification, ratified in 2016, is a next-generation open standard for enabling real-time communication of sensor data. Building on over a decade of OGC Sensor Web Enablement (SWE) Standards, STA offers a rich data model that can represent a range of sensor and phenomena types (e.g. fixed sensors sensing fixed phenomena, fixed sensors sensing moving phenomena, mobile sensors sensing fixed phenomena, and mobile sensors sensing moving phenomena) and is data agnostic. Additionally, and in contrast to previous SWE standards, STA is developer-friendly, as is evident from its convenient JSON serialization, and expressive OData-based query language (with support for geospatial queries); with its Message Queue Telemetry Transport (MQTT), STA is also well-suited to efficient real-time data publishing and discovery. All these attributes make STA potentially useful for use in environmental monitoring sensor networks. Here we present Kinota(TM), an Open-Source NoSQL implementation of OGC SensorThings for large-scale high-resolution real-time environmental monitoring. Kinota, which roughly stands for Knowledge from Internet of Things Analyses, relies on Cassandra its underlying data store, which is a horizontally scalable, fault-tolerant open-source database that is often used to store time-series data for Big Data applications (though integration with other NoSQL or rational databases is possible). With this foundation, Kinota can scale to store data from an arbitrary number of sensors collecting data every 500 milliseconds. Additionally, Kinota architecture is very modular allowing for customization by adopters who can choose to replace parts of the existing implementation when desirable. The architecture is also highly portable providing the flexibility to choose between cloud providers like azure, amazon, google etc. The scalable, flexible and cloud friendly architecture of Kinota makes it ideal for use in next-generation large-scale and high-resolution real-time environmental monitoring networks used in domains such as hydrology, geomorphology, and geophysics, as well as management applications such as flood early warning, and regulatory enforcement.
The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index.

PubMed

Larsen, Peder Olesen; von Ins, Markus

2010-09-01

The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source.
Performance of an open-source heart sound segmentation algorithm on eight independent databases.

PubMed

Liu, Chengyu; Springer, David; Clifford, Gari D

2017-08-01

Heart sound segmentation is a prerequisite step for the automatic analysis of heart sound signals, facilitating the subsequent identification and classification of pathological events. Recently, hidden Markov model-based algorithms have received increased interest due to their robustness in processing noisy recordings. In this study we aim to evaluate the performance of the recently published logistic regression based hidden semi-Markov model (HSMM) heart sound segmentation method, by using a wider variety of independently acquired data of varying quality. Firstly, we constructed a systematic evaluation scheme based on a new collection of heart sound databases, which we assembled for the PhysioNet/CinC Challenge 2016. This collection includes a total of more than 120 000 s of heart sounds recorded from 1297 subjects (including both healthy subjects and cardiovascular patients) and comprises eight independent heart sound databases sourced from multiple independent research groups around the world. Then, the HSMM-based segmentation method was evaluated using the assembled eight databases. The common evaluation metrics of sensitivity, specificity, accuracy, as well as the [Formula: see text] measure were used. In addition, the effect of varying the tolerance window for determining a correct segmentation was evaluated. The results confirm the high accuracy of the HSMM-based algorithm on a separate test dataset comprised of 102 306 heart sounds. An average [Formula: see text] score of 98.5% for segmenting S1 and systole intervals and 97.2% for segmenting S2 and diastole intervals were observed. The [Formula: see text] score was shown to increases with an increases in the tolerance window size, as expected. The high segmentation accuracy of the HSMM-based algorithm on a large database confirmed the algorithm's effectiveness. The described evaluation framework, combined with the largest collection of open access heart sound data, provides essential resources for evaluators who need to test their algorithms with realistic data and share reproducible results.
The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index

PubMed Central

von Ins, Markus

2010-01-01

The growth rate of scientific publication has been studied from 1907 to 2007 using available data from a number of literature databases, including Science Citation Index (SCI) and Social Sciences Citation Index (SSCI). Traditional scientific publishing, that is publication in peer-reviewed journals, is still increasing although there are big differences between fields. There are no indications that the growth rate has decreased in the last 50 years. At the same time publication using new channels, for example conference proceedings, open archives and home pages, is growing fast. The growth rate for SCI up to 2007 is smaller than for comparable databases. This means that SCI was covering a decreasing part of the traditional scientific literature. There are also clear indications that the coverage by SCI is especially low in some of the scientific areas with the highest growth rate, including computer science and engineering sciences. The role of conference proceedings, open access archives and publications published on the net is increasing, especially in scientific fields with high growth rates, but this has only partially been reflected in the databases. The new publication channels challenge the use of the big databases in measurements of scientific productivity or output and of the growth rate of science. Because of the declining coverage and this challenge it is problematic that SCI has been used and is used as the dominant source for science indicators based on publication and citation numbers. The limited data available for social sciences show that the growth rate in SSCI was remarkably low and indicate that the coverage by SSCI was declining over time. National Science Indicators from Thomson Reuters is based solely on SCI, SSCI and Arts and Humanities Citation Index (AHCI). Therefore the declining coverage of the citation databases problematizes the use of this source. PMID:20700371
US Army Research Laboratory Joint Interagency Field Experimentation 15-2 Final Report

DTIC Science & Technology

2015-12-01

February 2015, at Alameda Island, California. Advanced text analytics capabilities were demonstrated in a logically coherent workflow pipeline that... text processing capabilities allowed the targeted use of a persistent imagery sensor for rapid detection of mission- critical events. The creation of...a very large text database from open source data provides a relevant and unclassified foundation for continued development of text -processing
Enabling a systems biology knowledgebase with gaggle and firegoose

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baliga, Nitin S.

The overall goal of this project was to extend the existing Gaggle and Firegoose systems to develop an open-source technology that runs over the web and links desktop applications with many databases and software applications. This technology would enable researchers to incorporate workflows for data analysis that can be executed from this interface to other online applications. The four specific aims were to (1) provide one-click mapping of genes, proteins, and complexes across databases and species; (2) enable multiple simultaneous workflows; (3) expand sophisticated data analysis for online resources; and enhance open-source development of the Gaggle-Firegoose infrastructure. Gaggle is anmore » open-source Java software system that integrates existing bioinformatics programs and data sources into a user-friendly, extensible environment to allow interactive exploration, visualization, and analysis of systems biology data. Firegoose is an extension to the Mozilla Firefox web browser that enables data transfer between websites and desktop tools including Gaggle. In the last phase of this funding period, we have made substantial progress on development and application of the Gaggle integration framework. We implemented the workspace to the Network Portal. Users can capture data from Firegoose and save them to the workspace. Users can create workflows to start multiple software components programmatically and pass data between them. Results of analysis can be saved to the cloud so that they can be easily restored on any machine. We also developed the Gaggle Chrome Goose, a plugin for the Google Chrome browser in tandem with an opencpu server in the Amazon EC2 cloud. This allows users to interactively perform data analysis on a single web page using the R packages deployed on the opencpu server. The cloud-based framework facilitates collaboration between researchers from multiple organizations. We have made a number of enhancements to the cmonkey2 application to enable and improve the integration within different environments, and we have created a new tools pipeline for generating EGRIN2 models in a largely automated way.« less
BioMart Central Portal: an open database network for the biological community

PubMed Central

Guberman, Jonathan M.; Ai, J.; Arnaiz, O.; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J.; Di Génova, A.; Forbes, Simon; Fujisawa, T.; Gadaleta, E.; Goodstein, D. M.; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S.; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R.; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J.; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S.; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B.; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J.; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D. T.; Wong-Erasmus, Marie; Yao, L.; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

2011-01-01

BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities. Database URL: http://central.biomart.org. PMID:21930507
pvsR: An Open Source Interface to Big Data on the American Political Sphere

PubMed Central

2015-01-01

Digital data from the political sphere is abundant, omnipresent, and more and more directly accessible through the Internet. Project Vote Smart (PVS) is a prominent example of this big public data and covers various aspects of U.S. politics in astonishing detail. Despite the vast potential of PVS’ data for political science, economics, and sociology, it is hardly used in empirical research. The systematic compilation of semi-structured data can be complicated and time consuming as the data format is not designed for conventional scientific research. This paper presents a new tool that makes the data easily accessible to a broad scientific community. We provide the software called pvsR as an add-on to the R programming environment for statistical computing. This open source interface (OSI) serves as a direct link between a statistical analysis and the large PVS database. The free and open code is expected to substantially reduce the cost of research with PVS’ new big public data in a vast variety of possible applications. We discuss its advantages vis-à-vis traditional methods of data generation as well as already existing interfaces. The validity of the library is documented based on an illustration involving female representation in local politics. In addition, pvsR facilitates the replication of research with PVS data at low costs, including the pre-processing of data. Similar OSIs are recommended for other big public databases. PMID:26132154
Age-specific MRI templates for pediatric neuroimaging

PubMed Central

Sanchez, Carmen E.; Richards, John E.; Almli, C. Robert

2012-01-01

This study created a database of pediatric age-specific MRI brain templates for normalization and segmentation. Participants included children from 4.5 through 19.5 years, totaling 823 scans from 494 subjects. Open-source processing programs (FSL, SPM, ANTS) constructed head, brain and segmentation templates in 6 month intervals. The tissue classification (WM, GM, CSF) showed changes over age similar to previous reports. A volumetric analysis of age-related changes in WM and GM based on these templates showed expected increase/decrease pattern in GM and an increase in WM over the sampled ages. This database is available for use for neuroimaging studies (blindedforreview). PMID:22799759

TOPSAN: a dynamic web database for structural genomics.

PubMed

Ellrott, Kyle; Zmasek, Christian M; Weekes, Dana; Sri Krishna, S; Bakolitsa, Constantina; Godzik, Adam; Wooley, John

2011-01-01

The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN's content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.
A new phase-correlation-based iris matching for degraded images.

PubMed

Krichen, Emine; Garcia-Salicetti, Sonia; Dorizzi, Bernadette

2009-08-01

In this paper, we present a new phase-correlation-based iris matching approach in order to deal with degradations in iris images due to unconstrained acquisition procedures. Our matching system is a fusion of global and local Gabor phase-correlation schemes. The main originality of our local approach is that we do not only consider the correlation peak amplitudes but also their locations in different regions of the images. Results on several degraded databases, namely, the CASIA-BIOSECURE and Iris Challenge Evaluation 2005 databases, show the improvement of our method compared to two available reference systems, Masek and Open Source for Iris (OSRIS), in verification mode.
Integer sequence discovery from small graphs

PubMed Central

Hoppe, Travis; Petrone, Anna

2015-01-01

We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526
Integrated cluster management at Manchester

NASA Astrophysics Data System (ADS)

McNab, Andrew; Forti, Alessandra

2012-12-01

We describe an integrated management system using third-party, open source components used in operating a large Tier-2 site for particle physics. This system tracks individual assets and records their attributes such as MAC and IP addresses; derives DNS and DHCP configurations from this database; creates each host's installation and re-configuration scripts; monitors the services on each host according to the records of what should be running; and cross references tickets with asset records and per-asset monitoring pages. In addition, scripts which detect problems and automatically remove hosts record these new states in the database which are available to operators immediately through the same interface as tickets and monitoring.
MannDB: A microbial annotation database for protein characterization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, C; Lam, M; Smith, J

2006-05-19

MannDB was created to meet a need for rapid, comprehensive automated protein sequence analyses to support selection of proteins suitable as targets for driving the development of reagents for pathogen or protein toxin detection. Because a large number of open-source tools were needed, it was necessary to produce a software system to scale the computations for whole-proteome analysis. Thus, we built a fully automated system for executing software tools and for storage, integration, and display of automated protein sequence analysis and annotation data. MannDB is a relational database that organizes data resulting from fully automated, high-throughput protein-sequence analyses using open-sourcemore » tools. Types of analyses provided include predictions of cleavage, chemical properties, classification, features, functional assignment, post-translational modifications, motifs, antigenicity, and secondary structure. Proteomes (lists of hypothetical and known proteins) are downloaded and parsed from Genbank and then inserted into MannDB, and annotations from SwissProt are downloaded when identifiers are found in the Genbank entry or when identical sequences are identified. Currently 36 open-source tools are run against MannDB protein sequences either on local systems or by means of batch submission to external servers. In addition, BLAST against protein entries in MvirDB, our database of microbial virulence factors, is performed. A web client browser enables viewing of computational results and downloaded annotations, and a query tool enables structured and free-text search capabilities. When available, links to external databases, including MvirDB, are provided. MannDB contains whole-proteome analyses for at least one representative organism from each category of biological threat organism listed by APHIS, CDC, HHS, NIAID, USDA, USFDA, and WHO. MannDB comprises a large number of genomes and comprehensive protein sequence analyses representing organisms listed as high-priority agents on the websites of several governmental organizations concerned with bio-terrorism. MannDB provides the user with a BLAST interface for comparison of native and non-native sequences and a query tool for conveniently selecting proteins of interest. In addition, the user has access to a web-based browser that compiles comprehensive and extensive reports.« less
The Colima Volcano WebGIS: system acquisition, application and database development in an open-source environment

NASA Astrophysics Data System (ADS)

Manea, M.; Norini, G.; Capra, L.; Manea, V. C.

2009-04-01

The Colima Volcano is currently the most active Mexican volcano. After the 1913 plinian activity the volcano presented several eruptive phases that lasted few years, but since 1991 its activity became more persistent with vulcanian eruptions, lava and dome extrusions. During the last 15 years the volcano suffered several eruptive episodes as in 1991, 1994, 1998-1999, 2001-2003, 2004 and 2005 with the emplacement of pyroclastic flows. During rain seasons lahars are frequent affecting several infrastructures such as bridges and electric towers. Researchers from different institutions (Mexico, USA, Germany, Italy, and Spain) are currently working on several aspects of the volcano, from remote sensing, field data of old and recent deposits, structural framework, monitoring (rain, seismicity, deformation and visual observations) and laboratory experiments (analogue models and numerical simulations). Each investigation is focused to explain a single process, but it is fundamental to visualize the global status of the volcano in order to understand its behavior and to mitigate future hazards. The Colima Volcano WebGIS represents an initiative aimed to collect and store on a systematic basis all the data obtained so far for the volcano and to continuously update the database with new information. The Colima Volcano WebGIS is hosted on the Computational Geodynamics Laboratory web server and it is based entirely on Open Source software. The web pages, written in php/html will extract information from a mysql relational database, which will host the information needed for the MapBender application. There will be two types of intended users: 1) researchers working on the Colima Volcano, interested in this project and collaborating in common projects will be provided with open access to the database and will be able to introduce their own data, results, interpretation or recommendations; 2) general users, interested in accessing information on Colima Volcano will be provided with restricted access and will be able to visualize maps, images, diagrams, and current activity. The website can be visited at: http://www.geociencias.unam.mx/colima
Functional Interaction Network Construction and Analysis for Disease Discovery.

PubMed

Wu, Guanming; Haw, Robin

2017-01-01

Network-based approaches project seemingly unrelated genes or proteins onto a large-scale network context, therefore providing a holistic visualization and analysis platform for genomic data generated from high-throughput experiments, reducing the dimensionality of data via using network modules and increasing the statistic analysis power. Based on the Reactome database, the most popular and comprehensive open-source biological pathway knowledgebase, we have developed a highly reliable protein functional interaction network covering around 60 % of total human genes and an app called ReactomeFIViz for Cytoscape, the most popular biological network visualization and analysis platform. In this chapter, we describe the detailed procedures on how this functional interaction network is constructed by integrating multiple external data sources, extracting functional interactions from human curated pathway databases, building a machine learning classifier called a Naïve Bayesian Classifier, predicting interactions based on the trained Naïve Bayesian Classifier, and finally constructing the functional interaction database. We also provide an example on how to use ReactomeFIViz for performing network-based data analysis for a list of genes.
Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination

PubMed Central

Doiron, Dany; Marcon, Yannick; Fortier, Isabel; Burton, Paul; Ferretti, Vincent

2017-01-01

Abstract Motivation Improving the dissemination of information on existing epidemiological studies and facilitating the interoperability of study databases are essential to maximizing the use of resources and accelerating improvements in health. To address this, Maelstrom Research proposes Opal and Mica, two inter-operable open-source software packages providing out-of-the-box solutions for epidemiological data management, harmonization and dissemination. Implementation Opal and Mica are two standalone but inter-operable web applications written in Java, JavaScript and PHP. They provide web services and modern user interfaces to access them. General features Opal allows users to import, manage, annotate and harmonize study data. Mica is used to build searchable web portals disseminating study and variable metadata. When used conjointly, Mica users can securely query and retrieve summary statistics on geographically dispersed Opal servers in real-time. Integration with the DataSHIELD approach allows conducting more complex federated analyses involving statistical models. Availability Opal and Mica are open-source and freely available at [www.obiba.org] under a General Public License (GPL) version 3, and the metadata models and taxonomies that accompany them are available under a Creative Commons licence. PMID:29025122
Open source libraries and frameworks for mass spectrometry based proteomics: A developer's perspective☆

PubMed Central

Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

2014-01-01

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. PMID:23467006
Open source libraries and frameworks for mass spectrometry based proteomics: a developer's perspective.

PubMed

Perez-Riverol, Yasset; Wang, Rui; Hermjakob, Henning; Müller, Markus; Vesada, Vladimir; Vizcaíno, Juan Antonio

2014-01-01

Data processing, management and visualization are central and critical components of a state of the art high-throughput mass spectrometry (MS)-based proteomics experiment, and are often some of the most time-consuming steps, especially for labs without much bioinformatics support. The growing interest in the field of proteomics has triggered an increase in the development of new software libraries, including freely available and open-source software. From database search analysis to post-processing of the identification results, even though the objectives of these libraries and packages can vary significantly, they usually share a number of features. Common use cases include the handling of protein and peptide sequences, the parsing of results from various proteomics search engines output files, and the visualization of MS-related information (including mass spectra and chromatograms). In this review, we provide an overview of the existing software libraries, open-source frameworks and also, we give information on some of the freely available applications which make use of them. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan. Copyright © 2013 Elsevier B.V. All rights reserved.
Extracting and connecting chemical structures from text sources using chemicalize.org.

PubMed

Southan, Christopher; Stracz, Andras

2013-04-23

Exploring bioactive chemistry requires navigating between structures and data from a variety of text-based sources. While PubChem currently includes approximately 16 million document-extracted structures (15 million from patents) the extent of public inter-document and document-to-database links is still well below any estimated total, especially for journal articles. A major expansion in access to text-entombed chemistry is enabled by chemicalize.org. This on-line resource can process IUPAC names, SMILES, InChI strings, CAS numbers and drug names from pasted text, PDFs or URLs to generate structures, calculate properties and launch searches. Here, we explore its utility for answering questions related to chemical structures in documents and where these overlap with database records. These aspects are illustrated using a common theme of Dipeptidyl Peptidase 4 (DPPIV) inhibitors. Full-text open URL sources facilitated the download of over 1400 structures from a DPPIV patent and the alignment of specific examples with IC50 data. Uploading the SMILES to PubChem revealed extensive linking to patents and papers, including prior submissions from chemicalize.org as submitting source. A DPPIV medicinal chemistry paper was completely extracted and structures were aligned to the activity results table, as well as linked to other documents via PubChem. In both cases, key structures with data were partitioned from common chemistry by dividing them into individual new PDFs for conversion. Over 500 structures were also extracted from a batch of PubMed abstracts related to DPPIV inhibition. The drug structures could be stepped through each text occurrence and included some converted MeSH-only IUPAC names not linked in PubChem. Performing set intersections proved effective for detecting compounds-in-common between documents and merged extractions. This work demonstrates the utility of chemicalize.org for the exploration of chemical structure connectivity between documents and databases, including structure searches in PubChem, InChIKey searches in Google and the chemicalize.org archive. It has the flexibility to extract text from any internal, external or Web source. It synergizes with other open tools and the application is undergoing continued development. It should thus facilitate progress in medicinal chemistry, chemical biology and other bioactive chemistry domains.
Impacts and Viability of Open Source Software on Earth Science Metadata Clearing House and Service Registry Applications

NASA Astrophysics Data System (ADS)

Pilone, D.; Cechini, M. F.; Mitchell, A.

2011-12-01

Earth Science applications typically deal with large amounts of data and high throughput rates, if not also high transaction rates. While Open Source is frequently used for smaller scientific applications, large scale, highly available systems frequently fall back to "enterprise" class solutions like Oracle RAC or commercial grade JEE Application Servers. NASA's Earth Observing System Data and Information System (EOSDIS) provides end-to-end capabilities for managing NASA's Earth science data from multiple sources - satellites, aircraft, field measurements, and various other programs. A core capability of EOSDIS, the Earth Observing System (EOS) Clearinghouse (ECHO), is a highly available search and order clearinghouse of over 100 million pieces of science data that has evolved from its early R&D days to a fully operational system. Over the course of this maturity ECHO has largely transitioned from commercial frameworks, databases, and operating systems to Open Source solutions...and in some cases, back. In this talk we discuss the progression of our technological solutions and our lessons learned in the areas of: ? High performance, large scale searching solutions ? GeoSpatial search capabilities and dealing with multiple coordinate systems ? Search and storage of variable format source (science) data ? Highly available deployment solutions ? Scalable (elastic) solutions to visual searching and image handling Throughout the evolution of the ECHO system we have had to evaluate solutions with respect to performance, cost, developer productivity, reliability, and maintainability in the context of supporting global science users. Open Source solutions have played a significant role in our architecture and development but several critical commercial components remain (or have been reinserted) to meet our operational demands.
Preliminary Geologic Map of the Topanga 7.5' Quadrangle, Southern California: A Digital Database

USGS Publications Warehouse

Yerkes, R.F.; Campbell, R.H.

1995-01-01

INTRODUCTION This Open-File report is a digital geologic map database. This pamphlet serves to introduce and describe the digital data. There is no paper map included in the Open-File report. This digital map database is compiled from previously published sources combined with some new mapping and modifications in nomenclature. The geologic map database delineates map units that are identified by general age and lithology following the stratigraphic nomenclature of the U. S. Geological Survey. For detailed descriptions of the units, their stratigraphic relations and sources of geologic mapping consult Yerkes and Campbell (1994). More specific information about the units may be available in the original sources. The content and character of the database and methods of obtaining it are described herein. The geologic map database itself, consisting of three ARC coverages and one base layer, can be obtained over the Internet or by magnetic tape copy as described below. The processes of extracting the geologic map database from the tar file, and importing the ARC export coverages (procedure described herein), will result in the creation of an ARC workspace (directory) called 'topnga.' The database was compiled using ARC/INFO version 7.0.3, a commercial Geographic Information System (Environmental Systems Research Institute, Redlands, California), with version 3.0 of the menu interface ALACARTE (Fitzgibbon and Wentworth, 1991, Fitzgibbon, 1991, Wentworth and Fitzgibbon, 1991). It is stored in uncompressed ARC export format (ARC/INFO version 7.x) in a compressed UNIX tar (tape archive) file. The tar file was compressed with gzip, and may be uncompressed with gzip, which is available free of charge via the Internet from the gzip Home Page (http://w3.teaser.fr/~jlgailly/gzip). A tar utility is required to extract the database from the tar file. This utility is included in most UNIX systems, and can be obtained free of charge via the Internet from Internet Literacy's Common Internet File Formats Webpage http://www.matisse.net/files/formats.html). ARC/INFO export files (files with the .e00 extension) can be converted into ARC/INFO coverages in ARC/INFO (see below) and can be read by some other Geographic Information Systems, such as MapInfo via ArcLink and ESRI's ArcView (version 1.0 for Windows 3.1 to 3.11 is available for free from ESRI's web site: http://www.esri.com). 1. Different base layer - The original digital database included separates clipped out of the Los Angeles 1:100,000 sheet. This release includes a vectorized scan of a scale-stable negative of the Topanga 7.5 minute quadrangle. 2. Map projection - The files in the original release were in polyconic projection. The projection used in this release is state plane, which allows for the tiling of adjacent quadrangles. 3. File compression - The files in the original release were compressed with UNIX compression. The files in this release are compressed with gzip.
Regional Geologic Map of San Andreas and Related Faults in Carrizo Plain, Temblor, Caliente and La Panza Ranges and Vicinity, California; A Digital Database

USGS Publications Warehouse

Dibblee, T. W.; Digital database compiled by Graham, S. E.; Mahony, T.M.; Blissenbach, J.L.; Mariant, J.J.; Wentworth, C.M.

1999-01-01

This Open-File Report is a digital geologic map database. The report serves to introduce and describe the digital data. There is no paper map included in the Open-File Report. The report includes PostScript and PDF plot files that can be used to plot images of the geologic map sheet and explanation sheet. This digital map database is prepared from a previously published map by Dibblee (1973). The geologic map database delineates map units that are identified by general age, lithology, and clast size following the stratigraphic nomenclature of the U.S. Geological Survey. For descriptions of the units, their stratigraphic relations, and sources of geologic mapping, consult the explanation sheet (of99-14_4b.ps or of99-14_4d.pdf), or the original published paper map (Dibblee, 1973). The scale of the source map limits the spatial resolution (scale) of the database to 1:125,000 or smaller. For those interested in the geology of Carrizo Plain and vicinity who do not use an ARC/INFO compatible Geographic Information System (GIS), but would like to obtain a paper map and explanation, PDF and PostScript plot files containing map images of the data in the digital database, as well as PostScript and PDF plot files of the explanation sheet and explanatory text, have been included in the database package (please see the section 'Digital Plot Files', page 5). The PostScript plot files require a gzip utility to access them. For those without computer capability, we can provide users with the PostScript or PDF files on tape that can be taken to a vendor for plotting. Paper plots can also be ordered directly from the USGS (please see the section 'Obtaining Plots from USGS Open-File Services', page 5). The content and character of the database, methods of obtaining it, and processes of extracting the map database from the tar (tape archive) file are described herein. The map database itself, consisting of six ARC/INFO coverages, can be obtained over the Internet or by magnetic tape copy as described below. The database was compiled using ARC/INFO, a commercial Geographic Information System (Environmental Systems Research Institute, Redlands, California), with version 3.0 of the menu interface ALACARTE (Fitzgibbon and Wentworth, 1991, Fitzgibbon, 1991, Wentworth and Fitzgibbon, 1991). The ARC/INFO coverages are stored in uncompressed ARC export format (ARC/INFO version 7.x). All data files have been compressed, and may be uncompressed with gzip, which is available free of charge over the Internet via links from the USGS Public Domain Software page (http://edcwww.cr.usgs.gov/doc/edchome/ndcdb/public.html). ARC/INFO export files (files with the .e00 extension) can be converted into ARC/INFO coverages in ARC/INFO (see below) and can be read by some other Geographic Information Systems, such as MapInfo via ArcLink and ESRI's ArcView.
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature.

PubMed

Hart, Reece K; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A

2015-01-15

Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press.
A Python package for parsing, validating, mapping and formatting sequence variants using HGVS nomenclature

PubMed Central

Hart, Reece K.; Rico, Rudolph; Hare, Emily; Garcia, John; Westbrook, Jody; Fusaro, Vincent A.

2015-01-01

Summary: Biological sequence variants are commonly represented in scientific literature, clinical reports and databases of variation using the mutation nomenclature guidelines endorsed by the Human Genome Variation Society (HGVS). Despite the widespread use of the standard, no freely available and comprehensive programming libraries are available. Here we report an open-source and easy-to-use Python library that facilitates the parsing, manipulation, formatting and validation of variants according to the HGVS specification. The current implementation focuses on the subset of the HGVS recommendations that precisely describe sequence-level variation relevant to the application of high-throughput sequencing to clinical diagnostics. Availability and implementation: The package is released under the Apache 2.0 open-source license. Source code, documentation and issue tracking are available at http://bitbucket.org/hgvs/hgvs/. Python packages are available at PyPI (https://pypi.python.org/pypi/hgvs). Contact: reecehart@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25273102
eQuilibrator--the biochemical thermodynamics calculator.

PubMed

Flamholz, Avi; Noor, Elad; Bar-Even, Arren; Milo, Ron

2012-01-01

The laws of thermodynamics constrain the action of biochemical systems. However, thermodynamic data on biochemical compounds can be difficult to find and is cumbersome to perform calculations with manually. Even simple thermodynamic questions like 'how much Gibbs energy is released by ATP hydrolysis at pH 5?' are complicated excessively by the search for accurate data. To address this problem, eQuilibrator couples a comprehensive and accurate database of thermodynamic properties of biochemical compounds and reactions with a simple and powerful online search and calculation interface. The web interface to eQuilibrator (http://equilibrator.weizmann.ac.il) enables easy calculation of Gibbs energies of compounds and reactions given arbitrary pH, ionic strength and metabolite concentrations. The eQuilibrator code is open-source and all thermodynamic source data are freely downloadable in standard formats. Here we describe the database characteristics and implementation and demonstrate its use.
eQuilibrator—the biochemical thermodynamics calculator

PubMed Central

Flamholz, Avi; Noor, Elad; Bar-Even, Arren; Milo, Ron

2012-01-01

The laws of thermodynamics constrain the action of biochemical systems. However, thermodynamic data on biochemical compounds can be difficult to find and is cumbersome to perform calculations with manually. Even simple thermodynamic questions like ‘how much Gibbs energy is released by ATP hydrolysis at pH 5?’ are complicated excessively by the search for accurate data. To address this problem, eQuilibrator couples a comprehensive and accurate database of thermodynamic properties of biochemical compounds and reactions with a simple and powerful online search and calculation interface. The web interface to eQuilibrator (http://equilibrator.weizmann.ac.il) enables easy calculation of Gibbs energies of compounds and reactions given arbitrary pH, ionic strength and metabolite concentrations. The eQuilibrator code is open-source and all thermodynamic source data are freely downloadable in standard formats. Here we describe the database characteristics and implementation and demonstrate its use. PMID:22064852
An open-source wireless sensor stack: from Arduino to SDI-12 to Water One Flow

NASA Astrophysics Data System (ADS)

Hicks, S.; Damiano, S. G.; Smith, K. M.; Olexy, J.; Horsburgh, J. S.; Mayorga, E.; Aufdenkampe, A. K.

2013-12-01

Implementing a large-scale streaming environmental sensor network has previously been limited by the high cost of the datalogging and data communication infrastructure. The Christina River Basin Critical Zone Observatory (CRB-CZO) is overcoming the obstacles to large near-real-time data collection networks by using Arduino, an open source electronics platform, in combination with XBee ZigBee wireless radio modules. These extremely low-cost and easy-to-use open source electronics are at the heart of the new DIY movement and have provided solutions to countless projects by over half a million users worldwide. However, their use in environmental sensing is in its infancy. At present a primary limitation to widespread deployment of open-source electronics for environmental sensing is the lack of a simple, open-source software stack to manage streaming data from heterogeneous sensor networks. Here we present a functioning prototype software stack that receives sensor data over a self-meshing ZigBee wireless network from over a hundred sensors, stores the data locally and serves it on demand as a CUAHSI Water One Flow (WOF) web service. We highlight a few new, innovative components, including: (1) a versatile open data logger design based the Arduino electronics platform and ZigBee radios; (2) a software library implementing SDI-12 communication protocol between any Arduino platform and SDI12-enabled sensors without the need for additional hardware (https://github.com/StroudCenter/Arduino-SDI-12); and (3) 'midStream', a light-weight set of Python code that receives streaming sensor data, appends it with metadata on the fly by querying a relational database structured on an early version of the Observations Data Model version 2.0 (ODM2), and uses the WOFpy library to serve the data as WaterML via SOAP and REST web services.
The Global Earthquake Model - Past, Present, Future

NASA Astrophysics Data System (ADS)

Smolka, Anselm; Schneider, John; Stein, Ross

2014-05-01

The Global Earthquake Model (GEM) is a unique collaborative effort that aims to provide organizations and individuals with tools and resources for transparent assessment of earthquake risk anywhere in the world. By pooling data, knowledge and people, GEM acts as an international forum for collaboration and exchange. Sharing of data and risk information, best practices, and approaches across the globe are key to assessing risk more effectively. Through consortium driven global projects, open-source IT development and collaborations with more than 10 regions, leading experts are developing unique global datasets, best practice, open tools and models for seismic hazard and risk assessment. The year 2013 has seen the completion of ten global data sets or components addressing various aspects of earthquake hazard and risk, as well as two GEM-related, but independently managed regional projects SHARE and EMME. Notably, the International Seismological Centre (ISC) led the development of a new ISC-GEM global instrumental earthquake catalogue, which was made publicly available in early 2013. It has set a new standard for global earthquake catalogues and has found widespread acceptance and application in the global earthquake community. By the end of 2014, GEM's OpenQuake computational platform will provide the OpenQuake hazard/risk assessment software and integrate all GEM data and information products. The public release of OpenQuake is planned for the end of this 2014, and will comprise the following datasets and models: • ISC-GEM Instrumental Earthquake Catalogue (released January 2013) • Global Earthquake History Catalogue [1000-1903] • Global Geodetic Strain Rate Database and Model • Global Active Fault Database • Tectonic Regionalisation Model • Global Exposure Database • Buildings and Population Database • Earthquake Consequences Database • Physical Vulnerabilities Database • Socio-Economic Vulnerability and Resilience Indicators • Seismic Source Models • Ground Motion (Attenuation) Models • Physical Exposure Models • Physical Vulnerability Models • Composite Index Models (social vulnerability, resilience, indirect loss) • Repository of national hazard models • Uniform global hazard model Armed with these tools and databases, stakeholders worldwide will then be able to calculate, visualise and investigate earthquake risk, capture new data and to share their findings for joint learning. Earthquake hazard information will be able to be combined with data on exposure (buildings, population) and data on their vulnerability, for risk assessment around the globe. Furthermore, for a truly integrated view of seismic risk, users will be able to add social vulnerability and resilience indices and estimate the costs and benefits of different risk management measures. Having finished its first five-year Work Program at the end of 2013, GEM has entered into its second five-year Work Program 2014-2018. Beyond maintaining and enhancing the products developed in Work Program 1, the second phase will have a stronger focus on regional hazard and risk activities, and on seeing GEM products used for risk assessment and risk management practice at regional, national and local scales. Furthermore GEM intends to partner with similar initiatives underway for other natural perils, which together are needed to meet the need for advanced risk assessment methods, tools and data to underpin global disaster risk reduction efforts under the Hyogo Framework for Action #2 to be launched in Sendai/Japan in spring 2015

Software reuse example and challenges at NSIDC

NASA Astrophysics Data System (ADS)

Billingsley, B. W.; Brodzik, M.; Collins, J. A.

2009-12-01

NSIDC has created a new data discovery and access system, Searchlight, to provide users with the data they want in the format they want. NSIDC Searchlight supports discovery and access to disparate data types with on-the-fly reprojection, regridding and reformatting. Architected to both reuse open source systems and be reused itself, Searchlight reuses GDAL and Proj4 for manipulating data and format conversions, the netCDF Java library for creating netCDF output, MapServer and OpenLayers for defining spatial criteria and the JTS Topology Suite (JTS) in conjunction with Hibernate Spatial for database interaction and rich OGC-compliant spatial objects. The application reuses popular Java and Java Script libraries including Struts 2, Spring, JPA (Hibernate), Sitemesh, JFreeChart, JQuery, DOJO and a PostGIS PostgreSQL database. Future reuse of Searchlight components is supported at varying architecture levels, ranging from the database and model components to web services. We present the tools, libraries and programs that Searchlight has reused. We describe the architecture of Searchlight and explain the strategies deployed for reusing existing software and how Searchlight is built for reuse. We will discuss NSIDC reuse of the Searchlight components to support rapid development of new data delivery systems.
PRGdb: a bioinformatics platform for plant resistance gene analysis

PubMed Central

Sanseverino, Walter; Roma, Guglielmo; De Simone, Marco; Faino, Luigi; Melito, Sara; Stupka, Elia; Frusciante, Luigi; Ercolano, Maria Raffaella

2010-01-01

PRGdb is a web accessible open-source (http://www.prgdb.org) database that represents the first bioinformatic resource providing a comprehensive overview of resistance genes (R-genes) in plants. PRGdb holds more than 16 000 known and putative R-genes belonging to 192 plant species challenged by 115 different pathogens and linked with useful biological information. The complete database includes a set of 73 manually curated reference R-genes, 6308 putative R-genes collected from NCBI and 10463 computationally predicted putative R-genes. Thanks to a user-friendly interface, data can be examined using different query tools. A home-made prediction pipeline called Disease Resistance Analysis and Gene Orthology (DRAGO), based on reference R-gene sequence data, was developed to search for plant resistance genes in public datasets such as Unigene and Genbank. New putative R-gene classes containing unknown domain combinations were discovered and characterized. The development of the PRG platform represents an important starting point to conduct various experimental tasks. The inferred cross-link between genomic and phenotypic information allows access to a large body of information to find answers to several biological questions. The database structure also permits easy integration with other data types and opens up prospects for future implementations. PMID:19906694
Extensible Probabilistic Repository Technology (XPRT)

DTIC Science & Technology

2004-10-01

projects, such as, Centaurus , Evidence Data Base (EDB), etc., others were fabricated, such as INS and FED, while others contain data from the open...Google Web Report Unlimited SOAP API News BBC News Unlimited WEB RSS 1.0 Centaurus Person Demographics 204,402 people from 240 countries...objects of the domain ontology map to the various simulated data-sources. For example, the PersonDemographics are stored in the Centaurus database, while
Open-Source Multi-Language Audio Database for Spoken Language Processing Applications

DTIC Science & Technology

2012-12-01

Mandarin, and Russian . Approximately 30 hours of speech were collected for each language. Each passage has been carefully transcribed at the...manual and automatic methods. The Russian passages have not yet been marked at the phonetic level. Another phase of the work was to explore...You Tube. 300 passages were collected in each of three languages—English, Mandarin, and Russian . Approximately 30 hours of speech were
Earthworm in the 21st century

NASA Astrophysics Data System (ADS)

Friberg, Paul; Lisowski, Stefan; Dricker, Ilya; Hellman, Sidney

2010-05-01

Earthworm (Johnson et al., 1995) is a fully open-source earthquake data acquisition and processing package that is in widespread use through out the world. Earthworm includes basic seismic data acquistion for the majority of the dataloggers currently available and provides network transport mechanisms and common formats as output for data transferral. In addition, it comes with network seismology tools to compute network detections, perform automated arrival picking, and automated hypocentral and magnitude estimations. More importantly it is an open and free framework in the C-programming language that can be used to create new modules that process waveform and earthquake data in near real time. The number of Earthworm installations is growing annually as are the number of contributions to the system. Furthermore its growth into other areas of waveform data acquistion (namely Geomagnetic observatories and Infrasound arrays) show its adaptability to other waveform technologies and processing strategies. In this presentation we discuss the coming challenges to growing Earthworm and new developments in its use; namely the open source add-ons that have become interfaces to Earthworm's core. These add-ons include GlowWorm, MagWorm, Hydra, SWARM, Winston, EarlyBird, Iworm, and most importantly, AQMS (formerly known as CHEETAH). The AQMS, ANSS Quake Monitoring System, is the Earthworm system created in California which has now been installed in the majority of Regional Seismic Networks (RSNs) in the United States. AQMS allows additional real-time and post-processing of Earthworm generated data to be stored and manipulated in a database using numerous database oriented tools. The use of a relational database for persistence provides users with the ability to implement configuration control and research capabilities not available in earlier Earthworm add-ons. By centralizing on AQMS, the RSNs will be able to leverage new developments by easily sharing Earthworm and AQMS modules and avoid the duplication and one-off/custom developments of the past.
Inconsistencies Between Physician-Reported Disclosures at the AAOS Annual Meeting and Industry-Reported Financial Disclosures in the Open Payments Database.

PubMed

Hannon, Charles P; Chalmers, Peter N; Carpiniello, Matthew F; Cvetanovich, Gregory L; Cole, Brian J; Bach, Bernard R

2016-10-19

The purpose of this study was to determine the rate and type of inconsistencies between disclosures self-reported by physicians at a major academic meeting in the United States and industry-reported disclosures in the Open Payments database for a concordant time period. Disclosures for every first and last author from the United States with a medical degree of a podium or poster presentation at the 2014 American Academy of Orthopaedic Surgeons (AAOS) Annual Meeting were collected and were compared with the disclosures reported in the Open Payments database to determine if any inconsistencies were present and, if so, within which category. In total, 1,925 total AAOS presenters were identified, and 1,113 met the inclusion criteria. Based on AAOS disclosures, 432 (39%) should have been listed within the Open Payments database. There were 125 presenters (11%) who reported an AAOS disclosure and thus should have been included in the Open Payments database, but were not included. An additional 259 presenters (23%) had ≥1 AAOS disclosures that were not reported or were improperly categorized in the Open Payments database. Inconsistencies were more common for authors who had significantly more poster presentations (p < 0.001), podium presentations (p = 0.01), total presentations (p < 0.001), and AAOS disclosures (p < 0.001) and a significantly higher value of payments in the Open Payments database (p < 0.001). In this sample, there was a 35% rate of inconsistency between physician-reported financial relationships for presenters at the AAOS Annual Meeting and industry-reported relationships published in the Open Payments database. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.
Screening the Medicines for Malaria Venture Pathogen Box across Multiple Pathogens Reclassifies Starting Points for Open-Source Drug Discovery

PubMed Central

Sykes, Melissa L.; Jones, Amy J.; Shelper, Todd B.; Simpson, Moana; Lang, Rebecca; Poulsen, Sally-Ann; Sleebs, Brad E.

2017-01-01

ABSTRACT Open-access drug discovery provides a substantial resource for diseases primarily affecting the poor and disadvantaged. The open-access Pathogen Box collection is comprised of compounds with demonstrated biological activity against specific pathogenic organisms. The supply of this resource by the Medicines for Malaria Venture has the potential to provide new chemical starting points for a number of tropical and neglected diseases, through repurposing of these compounds for use in drug discovery campaigns for these additional pathogens. We tested the Pathogen Box against kinetoplastid parasites and malaria life cycle stages in vitro. Consequently, chemical starting points for malaria, human African trypanosomiasis, Chagas disease, and leishmaniasis drug discovery efforts have been identified. Inclusive of this in vitro biological evaluation, outcomes from extensive literature reviews and database searches are provided. This information encompasses commercial availability, literature reference citations, other aliases and ChEMBL number with associated biological activity, where available. The release of this new data for the Pathogen Box collection into the public domain will aid the open-source model of drug discovery. Importantly, this will provide novel chemical starting points for drug discovery and target identification in tropical disease research. PMID:28674055
Numerical simulation of the SOFIA flow field

NASA Technical Reports Server (NTRS)

Klotz, Stephen P.

1995-01-01

This report provides a concise summary of the contribution of computational fluid dynamics (CFD) to the SOFIA (Stratospheric Observatory for Infrared Astronomy) project at NASA Ames and presents results obtained from closed- and open-cavity SOFIA simulations. The aircraft platform is a Boeing 747SP and these are the first SOFIA simulations run with the aircraft empennage included in the geometry database. In the open-cavity runs the telescope is mounted behind the wings. Results suggest that the cavity markedly influences the mean pressure distribution on empennage surfaces and that 110-140 decibel (db) sound pressure levels are typical in the cavity and on the horizontal and vertical stabilizers. A strong source of sound was found to exist on the rim of the open telescope cavity. The presence of this source suggests that additional design work needs to be performed in order to minimize the sound emanating from that location. A fluid dynamic analysis of the engine plumes is also contained in this report. The analysis was part of an effort to quantify the degradation of telescope performance resulting from the proximity of the port engine exhaust plumes to the open telescope bay.
Screening the Medicines for Malaria Venture Pathogen Box across Multiple Pathogens Reclassifies Starting Points for Open-Source Drug Discovery.

PubMed

Duffy, Sandra; Sykes, Melissa L; Jones, Amy J; Shelper, Todd B; Simpson, Moana; Lang, Rebecca; Poulsen, Sally-Ann; Sleebs, Brad E; Avery, Vicky M

2017-09-01

Open-access drug discovery provides a substantial resource for diseases primarily affecting the poor and disadvantaged. The open-access Pathogen Box collection is comprised of compounds with demonstrated biological activity against specific pathogenic organisms. The supply of this resource by the Medicines for Malaria Venture has the potential to provide new chemical starting points for a number of tropical and neglected diseases, through repurposing of these compounds for use in drug discovery campaigns for these additional pathogens. We tested the Pathogen Box against kinetoplastid parasites and malaria life cycle stages in vitro Consequently, chemical starting points for malaria, human African trypanosomiasis, Chagas disease, and leishmaniasis drug discovery efforts have been identified. Inclusive of this in vitro biological evaluation, outcomes from extensive literature reviews and database searches are provided. This information encompasses commercial availability, literature reference citations, other aliases and ChEMBL number with associated biological activity, where available. The release of this new data for the Pathogen Box collection into the public domain will aid the open-source model of drug discovery. Importantly, this will provide novel chemical starting points for drug discovery and target identification in tropical disease research. Copyright © 2017 Duffy et al.
Databases and Electronic Resources - Betty Petersen Memorial Library

Science.gov Websites

of NOAA-Wide and Open Access Databases on the NOAA Central Library website. American Meteorological to a nonfederal website. Open Science Directory Open Science Directory contains collections of Open Access Journals (e.g. Directory of Open Access Journals) and journals in the special programs (Hinari
NoSQL: collection document and cloud by using a dynamic web query form

NASA Astrophysics Data System (ADS)

Abdalla, Hemn B.; Lin, Jinzhao; Li, Guoquan

2015-07-01

Mongo-DB (from "humongous") is an open-source document database and the leading NoSQL database. A NoSQL (Not Only SQL, next generation databases, being non-relational, deal, open-source and horizontally scalable) presenting a mechanism for storage and retrieval of documents. Previously, we stored and retrieved the data using the SQL queries. Here, we use the MonogoDB that means we are not utilizing the MySQL and SQL queries. Directly importing the documents into our Drives, retrieving the documents on that drive by not applying the SQL queries, using the IO BufferReader and Writer, BufferReader for importing our type of document files to my folder (Drive). For retrieving the document files, the usage is BufferWriter from the particular folder (or) Drive. In this sense, providing the security for those storing files for what purpose means if we store the documents in our local folder means all or views that file and modified that file. So preventing that file, we are furnishing the security. The original document files will be changed to another format like in this paper; Binary format is used. Our documents will be converting to the binary format after that direct storing in one of our folder, that time the storage space will provide the private key for accessing that file. Wherever any user tries to discover the Document files means that file data are in the binary format, the document's file owner simply views that original format using that personal key from receive the secret key from the cloud.
Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration

PubMed Central

Gražulis, Saulius; Daškevič, Adriana; Merkys, Andrius; Chateigner, Daniel; Lutterotti, Luca; Quirós, Miguel; Serebryanaya, Nadezhda R.; Moeck, Peter; Downs, Robert T.; Le Bail, Armel

2012-01-01

Using an open-access distribution model, the Crystallography Open Database (COD, http://www.crystallography.net) collects all known ‘small molecule / small to medium sized unit cell’ crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated ∼150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of standard open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge. PMID:22070882
Generic HTML Form Processor: A versatile PHP script to save web-collected data into a MySQL database.

PubMed

Göritz, Anja S; Birnbaum, Michael H

2005-11-01

The customizable PHP script Generic HTML Form Processor is intended to assist researchers and students in quickly setting up surveys and experiments that can be administered via the Web. This script relieves researchers from the burdens of writing new CGI scripts and building databases for each Web study. Generic HTML Form Processor processes any syntactically correct HTML forminput and saves it into a dynamically created open-source database. We describe five modes for usage of the script that allow increasing functionality but require increasing levels of knowledge of PHP and Web servers: The first two modes require no previous knowledge, and the fifth requires PHP programming expertise. Use of Generic HTML Form Processor is free for academic purposes, and its Web address is www.goeritz.net/brmic.
The visible human project®: From body to bits.

PubMed

Ackerman, Michael J

2016-08-01

In the middle 1990's the U.S. National Library sponsored the acquisition and development of the Visible Human Project® data base. This image database contains anatomical cross-sectional images which allow the reconstruction of three dimensional male and female anatomy to an accuracy of less than 1.0 mm. The male anatomy is contained in a 15 gigabyte database, the female in a 39 gigabyte database. This talk will describe why and how this project was accomplished and demonstrate some of the products which the Visible Human dataset has made possible. I will conclude by describing how the Visible Human Project, completed over 20 years ago, has led the National Library of Medicine to a series of image research projects including an open source image processing toolkit which is included in several commercial products.
DianaHealth.com, an On-Line Database Containing Appraisals of the Clinical Value and Appropriateness of Healthcare Interventions: Database Development and Retrospective Analysis.

PubMed

Bonfill, Xavier; Osorio, Dimelza; Solà, Ivan; Pijoan, Jose Ignacio; Balasso, Valentina; Quintana, Maria Jesús; Puig, Teresa; Bolibar, Ignasi; Urrútia, Gerard; Zamora, Javier; Emparanza, José Ignacio; Gómez de la Cámara, Agustín; Ferreira-González, Ignacio

2016-01-01

To describe the development of a novel on-line database aimed to serve as a source of information concerning healthcare interventions appraised for their clinical value and appropriateness by several initiatives worldwide, and to present a retrospective analysis of the appraisals already included in the database. Database development and a retrospective analysis. The database DianaHealth.com is already on-line and it is regularly updated, independent, open access and available in English and Spanish. Initiatives are identified in medical news, in article references, and by contacting experts in the field. We include appraisals in the form of clinical recommendations, expert analyses, conclusions from systematic reviews, and original research that label any health care intervention as low-value or inappropriate. We obtain the information necessary to classify the appraisals according to type of intervention, specialties involved, publication year, authoring initiative, and key words. The database is accessible through a search engine which retrieves a list of appraisals and a link to the website where they were published. DianaHealth.com also provides a brief description of the initiatives and a section where users can report new appraisals or suggest new initiatives. From January 2014 to July 2015, the on-line database included 2940 appraisals from 22 initiatives: eleven campaigns gathering clinical recommendations from scientific societies, five sets of conclusions from literature review, three sets of recommendations from guidelines, two collections of articles on low clinical value in medical journals, and an initiative of our own. We have developed an open access on-line database of appraisals about healthcare interventions considered of low clinical value or inappropriate. DianaHealth.com could help physicians and other stakeholders make better decisions concerning patient care and healthcare systems sustainability. Future efforts should be focused on assessing the impact of these appraisals in the clinical practice.
DianaHealth.com, an On-Line Database Containing Appraisals of the Clinical Value and Appropriateness of Healthcare Interventions: Database Development and Retrospective Analysis

PubMed Central

Bonfill, Xavier; Osorio, Dimelza; Solà, Ivan; Pijoan, Jose Ignacio; Balasso, Valentina; Quintana, Maria Jesús; Puig, Teresa; Bolibar, Ignasi; Urrútia, Gerard; Zamora, Javier; Emparanza, José Ignacio; Gómez de la Cámara, Agustín; Ferreira-González, Ignacio

2016-01-01

Objective To describe the development of a novel on-line database aimed to serve as a source of information concerning healthcare interventions appraised for their clinical value and appropriateness by several initiatives worldwide, and to present a retrospective analysis of the appraisals already included in the database. Methods and Findings Database development and a retrospective analysis. The database DianaHealth.com is already on-line and it is regularly updated, independent, open access and available in English and Spanish. Initiatives are identified in medical news, in article references, and by contacting experts in the field. We include appraisals in the form of clinical recommendations, expert analyses, conclusions from systematic reviews, and original research that label any health care intervention as low-value or inappropriate. We obtain the information necessary to classify the appraisals according to type of intervention, specialties involved, publication year, authoring initiative, and key words. The database is accessible through a search engine which retrieves a list of appraisals and a link to the website where they were published. DianaHealth.com also provides a brief description of the initiatives and a section where users can report new appraisals or suggest new initiatives. From January 2014 to July 2015, the on-line database included 2940 appraisals from 22 initiatives: eleven campaigns gathering clinical recommendations from scientific societies, five sets of conclusions from literature review, three sets of recommendations from guidelines, two collections of articles on low clinical value in medical journals, and an initiative of our own. Conclusions We have developed an open access on-line database of appraisals about healthcare interventions considered of low clinical value or inappropriate. DianaHealth.com could help physicians and other stakeholders make better decisions concerning patient care and healthcare systems sustainability. Future efforts should be focused on assessing the impact of these appraisals in the clinical practice. PMID:26840451
Developing a low-cost open-source CTD for research and outreach

NASA Astrophysics Data System (ADS)

Thaler, A. D.; Sturdivant, K.

2013-12-01

Developing a low-cost open-source CTD for research and outreach Andrew David Thaler and Kersey Sturdivant Conductivity, temperature, and depth (CTD). With these three measurements, marine scientists can unlock ocean patterns hidden beneath the waves. The ocean is not uniform, it its filled with swirling eddies, temperature boundaries, layers of high and low salinity, changing densities, and many other physical characteristics. To reveal these patterns, oceanographers use a tool called the CTD. A CTD is found on almost every major research vessel. Rare is the scientific expedition-whether it be coastal work in shallow estuaries or journeys to the deepest ocean trenches-that doesn't begin with the humble CTD cast. The CTD is not cheap. Commercial CTD's start at more the 5,000 and can climb as high as 25,000 or more. We believe that the prohibitive cost of a CTD is an unacceptable barrier to open science. The price tag excludes individuals and groups who lack research grants or significant private funds from conducting oceanographic research. We want to make this tool-the workhorse of oceanographic research-available to anyone with an interest in the oceans. The OpenCTD is a low-cost, open-source CTD suitable for both educators and scientists. The platform is built using readily available parts and is powered by an Arduino-based microcontroller. Our goal is to create a device that is accurate enough to be used for scientific research and can be constructed for less than $200. Source codes, circuit diagrams, and building plans will be freely available. The final instrument will be effective to 200 meters depth. Why 200 meters? For many coastal regions, 200 meters of water depth covers the majority of the ocean that is accessible by small boat. The OpenCTD is targeted to people working in this niche, where entire research projects can be conducted for less than the cost of a commercial CTD. However, the Open CTD is scalable, and anyone with the inclination can adapt our plans to operate in deeper waters. Through a crowdfunding initiative and collaboration with numerous interested scientists, researchers, educators, and developers, we developed the framework for a low-cost, open-source, CTD that is appropriate for both scientific research and public outreach. We envision a network or researchers and educators using the OpenCTD to contribute to local and region scientific programs through open-source databases.
Analysis of recreational closed-circuit rebreather deaths 1998-2010.

PubMed

Fock, Andrew W

2013-06-01

Since the introduction of recreational closed-circuit rebreathers (CCRs) in 1998, there have been many recorded deaths. Rebreather deaths have been quoted to be as high as 1 in 100 users. Rebreather fatalities between 1998 and 2010 were extracted from the Deeplife rebreather mortality database, and inaccuracies were corrected where known. Rebreather absolute numbers were derived from industry discussions and training agency statistics. Relative numbers and brands were extracted from the Rebreather World website database and a Dutch rebreather survey. Mortality was compared with data from other databases. A fault-tree analysis of rebreathers was compared to that of open-circuit scuba of various configurations. Finally, a risk analysis was applied to the mortality database. The 181 recorded recreational rebreather deaths occurred at about 10 times the rate of deaths amongst open-circuit recreational scuba divers. No particular brand or type of rebreather was over-represented. Closed-circuit rebreathers have a 25-fold increased risk of component failure compared to a manifolded twin-cylinder open-circuit system. This risk can be offset by carrying a redundant 'bailout' system. Two-thirds of fatal dives were associated with a high-risk dive or high-risk behaviour. There are multiple points in the human-machine interface (HMI) during the use of rebreathers that can result in errors that may lead to a fatality. While rebreathers have an intrinsically higher risk of mechanical failure as a result of their complexity, this can be offset by good design incorporating redundancy and by carrying adequate 'bailout' or alternative gas sources for decompression in the event of a failure. Designs that minimize the chances of HMI errors and training that highlights this area may help to minimize fatalities.
CONNJUR Workflow Builder: A software integration environment for spectral reconstruction

PubMed Central

Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O.; Ellis, Heidi J.C.; Gryk, Michael R.

2015-01-01

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses. PMID:26066803
CHRONOS architecture: Experiences with an open-source services-oriented architecture for geoinformatics

USGS Publications Warehouse

Fils, D.; Cervato, C.; Reed, J.; Diver, P.; Tang, X.; Bohling, G.; Greer, D.

2009-01-01

CHRONOS's purpose is to transform Earth history research by seamlessly integrating stratigraphic databases and tools into a virtual on-line stratigraphic record. In this paper, we describe the various components of CHRONOS's distributed data system, including the encoding of semantic and descriptive data into a service-based architecture. We give examples of how we have integrated well-tested resources available from the open-source and geoinformatic communities, like the GeoSciML schema and the simple knowledge organization system (SKOS), into the services-oriented architecture to encode timescale and phylogenetic synonymy data. We also describe on-going efforts to use geospatially enhanced data syndication and informally including semantic information by embedding it directly into the XHTML Document Object Model (DOM). XHTML DOM allows machine-discoverable descriptive data such as licensing and citation information to be incorporated directly into data sets retrieved by users. ?? 2008 Elsevier Ltd. All rights reserved.

ArrayNinja: An Open Source Platform for Unified Planning and Analysis of Microarray Experiments.

PubMed

Dickson, B M; Cornett, E M; Ramjan, Z; Rothbart, S B

2016-01-01

Microarray-based proteomic platforms have emerged as valuable tools for studying various aspects of protein function, particularly in the field of chromatin biochemistry. Microarray technology itself is largely unrestricted in regard to printable material and platform design, and efficient multidimensional optimization of assay parameters requires fluidity in the design and analysis of custom print layouts. This motivates the need for streamlined software infrastructure that facilitates the combined planning and analysis of custom microarray experiments. To this end, we have developed ArrayNinja as a portable, open source, and interactive application that unifies the planning and visualization of microarray experiments and provides maximum flexibility to end users. Array experiments can be planned, stored to a private database, and merged with the imaged results for a level of data interaction and centralization that is not currently attainable with available microarray informatics tools. © 2016 Elsevier Inc. All rights reserved.
CONNJUR Workflow Builder: a software integration environment for spectral reconstruction.

PubMed

Fenwick, Matthew; Weatherby, Gerard; Vyas, Jay; Sesanker, Colbert; Martyn, Timothy O; Ellis, Heidi J C; Gryk, Michael R

2015-07-01

CONNJUR Workflow Builder (WB) is an open-source software integration environment that leverages existing spectral reconstruction tools to create a synergistic, coherent platform for converting biomolecular NMR data from the time domain to the frequency domain. WB provides data integration of primary data and metadata using a relational database, and includes a library of pre-built workflows for processing time domain data. WB simplifies maximum entropy reconstruction, facilitating the processing of non-uniformly sampled time domain data. As will be shown in the paper, the unique features of WB provide it with novel abilities to enhance the quality, accuracy, and fidelity of the spectral reconstruction process. WB also provides features which promote collaboration, education, parameterization, and non-uniform data sets along with processing integrated with the Rowland NMR Toolkit (RNMRTK) and NMRPipe software packages. WB is available free of charge in perpetuity, dual-licensed under the MIT and GPL open source licenses.
Extending Marine Species Distribution Maps Using Non-Traditional Sources

PubMed Central

Moretzsohn, Fabio; Gibeaut, James

2015-01-01

Abstract Background Traditional sources of species occurrence data such as peer-reviewed journal articles and museum-curated collections are included in species databases after rigorous review by species experts and evaluators. The distribution maps created in this process are an important component of species survival evaluations, and are used to adapt, extend and sometimes contract polygons used in the distribution mapping process. New information During an IUCN Red List Gulf of Mexico Fishes Assessment Workshop held at The Harte Research Institute for Gulf of Mexico Studies, a session included an open discussion on the topic of including other sources of species occurrence data. During the last decade, advances in portable electronic devices and applications enable 'citizen scientists' to record images, location and data about species sightings, and submit that data to larger species databases. These applications typically generate point data. Attendees of the workshop expressed an interest in how that data could be incorporated into existing datasets, how best to ascertain the quality and value of that data, and what other alternate data sources are available. This paper addresses those issues, and provides recommendations to ensure quality data use. PMID:25941453
Spectral-Element Seismic Wave Propagation Codes for both Forward Modeling in Complex Media and Adjoint Tomography

NASA Astrophysics Data System (ADS)

Smith, J. A.; Peter, D. B.; Tromp, J.; Komatitsch, D.; Lefebvre, M. P.

2015-12-01

We present both SPECFEM3D_Cartesian and SPECFEM3D_GLOBE open-source codes, representing high-performance numerical wave solvers simulating seismic wave propagation for local-, regional-, and global-scale application. These codes are suitable for both forward propagation in complex media and tomographic imaging. Both solvers compute highly accurate seismic wave fields using the continuous Galerkin spectral-element method on unstructured meshes. Lateral variations in compressional- and shear-wave speeds, density, as well as 3D attenuation Q models, topography and fluid-solid coupling are all readily included in both codes. For global simulations, effects due to rotation, ellipticity, the oceans, 3D crustal models, and self-gravitation are additionally included. Both packages provide forward and adjoint functionality suitable for adjoint tomography on high-performance computing architectures. We highlight the most recent release of the global version which includes improved performance, simultaneous MPI runs, OpenCL and CUDA support via an automatic source-to-source transformation library (BOAST), parallel I/O readers and writers for databases using ADIOS and seismograms using the recently developed Adaptable Seismic Data Format (ASDF) with built-in provenance. This makes our spectral-element solvers current state-of-the-art, open-source community codes for high-performance seismic wave propagation on arbitrarily complex 3D models. Together with these solvers, we provide full-waveform inversion tools to image the Earth's interior at unprecedented resolution.
Chemotion ELN: an Open Source electronic lab notebook for chemists in academia.

PubMed

Tremouilhac, Pierre; Nguyen, An; Huang, Yu-Chieh; Kotov, Serhii; Lütjohann, Dominic Sebastian; Hübsch, Florian; Jung, Nicole; Bräse, Stefan

2017-09-25

The development of an electronic lab notebook (ELN) for researchers working in the field of chemical sciences is presented. The web based application is available as an Open Source software that offers modern solutions for chemical researchers. The Chemotion ELN is equipped with the basic functionalities necessary for the acquisition and processing of chemical data, in particular the work with molecular structures and calculations based on molecular properties. The ELN supports planning, description, storage, and management for the routine work of organic chemists. It also provides tools for communicating and sharing the recorded research data among colleagues. Meeting the requirements of a state of the art research infrastructure, the ELN allows the search for molecules and reactions not only within the user's data but also in conventional external sources as provided by SciFinder and PubChem. The presented development makes allowance for the growing dependency of scientific activity on the availability of digital information by providing Open Source instruments to record and reuse research data. The current version of the ELN has been using for over half of a year in our chemistry research group, serves as a common infrastructure for chemistry research and enables chemistry researchers to build their own databases of digital information as a prerequisite for the detailed, systematic investigation and evaluation of chemical reactions and mechanisms.
SU-E-T-103: Development and Implementation of Web Based Quality Control Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Studinski, R; Taylor, R; Angers, C

Purpose: Historically many radiation medicine programs have maintained their Quality Control (QC) test results in paper records or Microsoft Excel worksheets. Both these approaches represent significant logistical challenges, and are not predisposed to data review and approval. It has been our group's aim to develop and implement web based software designed not just to record and store QC data in a centralized database, but to provide scheduling and data review tools to help manage a radiation therapy clinics Equipment Quality control program. Methods: The software was written in the Python programming language using the Django web framework. In order tomore » promote collaboration and validation from other centres the code was made open source and is freely available to the public via an online source code repository. The code was written to provide a common user interface for data entry, formalize the review and approval process, and offer automated data trending and process control analysis of test results. Results: As of February 2014, our installation of QAtrack+ has 180 tests defined in its database and has collected ∼22 000 test results, all of which have been reviewed and approved by a physicist via QATrack+'s review tools. These results include records for quality control of Elekta accelerators, CT simulators, our brachytherapy programme, TomoTherapy and Cyberknife units. Currently at least 5 other centres are known to be running QAtrack+ clinically, forming the start of an international user community. Conclusion: QAtrack+ has proven to be an effective tool for collecting radiation therapy QC data, allowing for rapid review and trending of data for a wide variety of treatment units. As free and open source software, all source code, documentation and a bug tracker are available to the public at https://bitbucket.org/tohccmedphys/qatrackplus/.« less
Virtual GEOINT Center: C2ISR through an avatar's eyes

NASA Astrophysics Data System (ADS)

Seibert, Mark; Tidbal, Travis; Basil, Maureen; Muryn, Tyler; Scupski, Joseph; Williams, Robert

2013-05-01

As the number of devices collecting and sending data in the world are increasing, finding ways to visualize and understand that data is becoming more and more of a problem. This has often been coined as the problem of "Big Data." The Virtual Geoint Center (VGC) aims to aid in solving that problem by providing a way to combine the use of the virtual world with outside tools. Using open-source software such as OpenSim and Blender, the VGC uses a visually stunning 3D environment to display the data sent to it. The VGC is broken up into two major components: The Kinect Minimap, and the Geoint Map. The Kinect Minimap uses the Microsoft Kinect and its open-source software to make a miniature display of people the Kinect detects in front of it. The Geoint Map collect smartphone sensor information from online databases and displays them in real time onto a map generated by Google Maps. By combining outside tools and the virtual world, the VGC can help a user "visualize" data, and provide additional tools to "understand" the data.
A prospective international cooperative information technology platform built using open-source tools for improving the access to and safety of bone marrow transplantation in low- and middle-income countries.

PubMed

Agarwal, Rajat Kumar; Sedai, Amit; Dhimal, Sunil; Ankita, Kumari; Clemente, Luigi; Siddique, Sulman; Yaqub, Naila; Khalid, Sadaf; Itrat, Fatima; Khan, Anwar; Gilani, Sarah Khan; Marwah, Priya; Soni, Rajpreet; Missiry, Mohamed El; Hussain, Mohamed Hamed; Uderzo, Cornelio; Faulkner, Lawrence

2014-01-01

Jagriti Innovations developed a collaboration tool in partnership with the Cure2Children Foundation that has been used by health professionals in Italy, Pakistan, and India for the collaborative management of patients undergoing bone marrow transplantation (BMT) for thalassemia major since August 2008. This online open-access database covers data recording, analyzing, and reporting besides enabling knowledge exchange, telemedicine, capacity building, and quality assurance. As of February 2014, over 2400 patients have been registered and 112 BMTs have been performed with outcomes comparable to international standards, but at a fraction of the cost. This approach avoids medical emigration and contributes to local healthcare strengthening and competitiveness. This paper presents the experience and clinical outcomes associated with the use of this platform built using open-source tools and focusing on a locally pertinent tertiary care procedure-BMT. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
HDDTOOLS: an R package serving Hydrological Data Discovery Tools

NASA Astrophysics Data System (ADS)

Vitolo, C.; Buytaert, W.

2014-12-01

Many governmental bodies and institutions are currently committed to publish open data as the result of a trend of increasing transparency, based on which a wide variety of information produced at public expense is now becoming open and freely available to improve public involvement in the process of decision and policy making. Discovery, access and retrieval of information is, however, not always a simple task. Especially when programmatic access to data resources is not allowed, downloading metadata catalogue, select the information needed, request datasets, de-compression, conversion, manual filtering and parsing can become rather tedious. The R package "hddtools" is an open source project, designed to make all the above operations more efficient by means of re-usable functions. The package facilitate non programmatic access to various online data sources such as the Global Runoff Data Centre, NASA's TRMM mission, the Data60UK database amongst others. This package complements R's growing functionality in environmental web technologies to bridge the gap between data providers and data consumers and it is designed to be the starting building block of scientific workflows for linking data and models in a seamless fashion.
Standard Port-Visit Cost Forecasting Model for U.S. Navy Husbanding Contracts

DTIC Science & Technology

2009-12-01

Protocol (HTTP) server.35 2. MySQL . An open-source database.36 3. PHP . A common scripting language used for Web development.37 E. IMPLEMENTATION OF...Inc. (2009). MySQL Community Server (Version 5.1) [Software]. Available from http://dev.mysql.com/downloads/ 37 The PHP Group (2009). PHP (Version...Logistics Services MySQL My Structured Query Language NAVSUP Navy Supply Systems Command NC Non-Contract Items NPS Naval Postgraduate
The Brainomics/Localizer database.

PubMed

Papadopoulos Orfanos, Dimitri; Michel, Vincent; Schwartz, Yannick; Pinel, Philippe; Moreno, Antonio; Le Bihan, Denis; Frouin, Vincent

2017-01-01

The Brainomics/Localizer database exposes part of the data collected by the in-house Localizer project, which planned to acquire four types of data from volunteer research subjects: anatomical MRI scans, functional MRI data, behavioral and demographic data, and DNA sampling. Over the years, this local project has been collecting such data from hundreds of subjects. We had selected 94 of these subjects for their complete datasets, including all four types of data, as the basis for a prior publication; the Brainomics/Localizer database publishes the data associated with these 94 subjects. Since regulatory rules prevent us from making genetic data available for download, the database serves only anatomical MRI scans, functional MRI data, behavioral and demographic data. To publish this set of heterogeneous data, we use dedicated software based on the open-source CubicWeb semantic web framework. Through genericity in the data model and flexibility in the display of data (web pages, CSV, JSON, XML), CubicWeb helps us expose these complex datasets in original and efficient ways. Copyright © 2015 Elsevier Inc. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Sanchez del Rio, Manuel; Bianchi, Davide; Cocco, Daniele

An open-source database containing metrology data for X-ray mirrors is presented. It makes available metrology data (mirror heights and slopes profiles) that can be used with simulation tools for calculating the effects of optical surface errors in the performances of an optical instrument, such as a synchrotron beamline. A typical case is the degradation of the intensity profile at the focal position in a beamline due to mirror surface errors. This database for metrology (DABAM) aims to provide to the users of simulation tools the data of real mirrors. The data included in the database are described in this paper,more » with details of how the mirror parameters are stored. An accompanying software is provided to allow simple access and processing of these data, calculate the most usual statistical parameters, and also include the option of creating input files for most used simulation codes. Some optics simulations are presented and discussed to illustrate the real use of the profiles from the database.« less
BioMart Central Portal: an open database network for the biological community.

PubMed

Guberman, Jonathan M; Ai, J; Arnaiz, O; Baran, Joachim; Blake, Andrew; Baldock, Richard; Chelala, Claude; Croft, David; Cros, Anthony; Cutts, Rosalind J; Di Génova, A; Forbes, Simon; Fujisawa, T; Gadaleta, E; Goodstein, D M; Gundem, Gunes; Haggarty, Bernard; Haider, Syed; Hall, Matthew; Harris, Todd; Haw, Robin; Hu, S; Hubbard, Simon; Hsu, Jack; Iyer, Vivek; Jones, Philip; Katayama, Toshiaki; Kinsella, R; Kong, Lei; Lawson, Daniel; Liang, Yong; Lopez-Bigas, Nuria; Luo, J; Lush, Michael; Mason, Jeremy; Moreews, Francois; Ndegwa, Nelson; Oakley, Darren; Perez-Llamas, Christian; Primig, Michael; Rivkin, Elena; Rosanoff, S; Shepherd, Rebecca; Simon, Reinhard; Skarnes, B; Smedley, Damian; Sperling, Linda; Spooner, William; Stevenson, Peter; Stone, Kevin; Teague, J; Wang, Jun; Wang, Jianxin; Whitty, Brett; Wong, D T; Wong-Erasmus, Marie; Yao, L; Youens-Clark, Ken; Yung, Christina; Zhang, Junjun; Kasprzyk, Arek

2011-01-01

BioMart Central Portal is a first of its kind, community-driven effort to provide unified access to dozens of biological databases spanning genomics, proteomics, model organisms, cancer data, ontology information and more. Anybody can contribute an independently maintained resource to the Central Portal, allowing it to be exposed to and shared with the research community, and linking it with the other resources in the portal. Users can take advantage of the common interface to quickly utilize different sources without learning a new system for each. The system also simplifies cross-database searches that might otherwise require several complicated steps. Several integrated tools streamline common tasks, such as converting between ID formats and retrieving sequences. The combination of a wide variety of databases, an easy-to-use interface, robust programmatic access and the array of tools make Central Portal a one-stop shop for biological data querying. Here, we describe the structure of Central Portal and show example queries to demonstrate its capabilities.
The Coral Trait Database, a curated database of trait information for coral species from the global oceans

NASA Astrophysics Data System (ADS)

Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C. L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

2016-03-01

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.
The Coral Trait Database, a curated database of trait information for coral species from the global oceans

PubMed Central

Madin, Joshua S.; Anderson, Kristen D.; Andreasen, Magnus Heide; Bridge, Tom C.L.; Cairns, Stephen D.; Connolly, Sean R.; Darling, Emily S.; Diaz, Marcela; Falster, Daniel S.; Franklin, Erik C.; Gates, Ruth D.; Hoogenboom, Mia O.; Huang, Danwei; Keith, Sally A.; Kosnik, Matthew A.; Kuo, Chao-Yang; Lough, Janice M.; Lovelock, Catherine E.; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M.; Pochon, Xavier; Pratchett, Morgan S.; Putnam, Hollie M.; Roberts, T. Edward; Stat, Michael; Wallace, Carden C.; Widman, Elizabeth; Baird, Andrew H.

2016-01-01

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism’s function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research. PMID:27023900
The Coral Trait Database, a curated database of trait information for coral species from the global oceans.

PubMed

Madin, Joshua S; Anderson, Kristen D; Andreasen, Magnus Heide; Bridge, Tom C L; Cairns, Stephen D; Connolly, Sean R; Darling, Emily S; Diaz, Marcela; Falster, Daniel S; Franklin, Erik C; Gates, Ruth D; Harmer, Aaron; Hoogenboom, Mia O; Huang, Danwei; Keith, Sally A; Kosnik, Matthew A; Kuo, Chao-Yang; Lough, Janice M; Lovelock, Catherine E; Luiz, Osmar; Martinelli, Julieta; Mizerek, Toni; Pandolfi, John M; Pochon, Xavier; Pratchett, Morgan S; Putnam, Hollie M; Roberts, T Edward; Stat, Michael; Wallace, Carden C; Widman, Elizabeth; Baird, Andrew H

2016-03-29

Trait-based approaches advance ecological and evolutionary research because traits provide a strong link to an organism's function and fitness. Trait-based research might lead to a deeper understanding of the functions of, and services provided by, ecosystems, thereby improving management, which is vital in the current era of rapid environmental change. Coral reef scientists have long collected trait data for corals; however, these are difficult to access and often under-utilized in addressing large-scale questions. We present the Coral Trait Database initiative that aims to bring together physiological, morphological, ecological, phylogenetic and biogeographic trait information into a single repository. The database houses species- and individual-level data from published field and experimental studies alongside contextual data that provide important framing for analyses. In this data descriptor, we release data for 56 traits for 1547 species, and present a collaborative platform on which other trait data are being actively federated. Our overall goal is for the Coral Trait Database to become an open-source, community-led data clearinghouse that accelerates coral reef research.
Search extension transforms Wiki into a relational system: a case for flavonoid metabolite database.

PubMed

Arita, Masanori; Suwa, Kazuhiro

2008-09-17

In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated.
Search extension transforms Wiki into a relational system: A case for flavonoid metabolite database

PubMed Central

Arita, Masanori; Suwa, Kazuhiro

2008-01-01

Background In computer science, database systems are based on the relational model founded by Edgar Codd in 1970. On the other hand, in the area of biology the word 'database' often refers to loosely formatted, very large text files. Although such bio-databases may describe conflicts or ambiguities (e.g. a protein pair do and do not interact, or unknown parameters) in a positive sense, the flexibility of the data format sacrifices a systematic query mechanism equivalent to the widely used SQL. Results To overcome this disadvantage, we propose embeddable string-search commands on a Wiki-based system and designed a half-formatted database. As proof of principle, a database of flavonoid with 6902 molecular structures from over 1687 plant species was implemented on MediaWiki, the background system of Wikipedia. Registered users can describe any information in an arbitrary format. Structured part is subject to text-string searches to realize relational operations. The system was written in PHP language as the extension of MediaWiki. All modifications are open-source and publicly available. Conclusion This scheme benefits from both the free-formatted Wiki style and the concise and structured relational-database style. MediaWiki supports multi-user environments for document management, and the cost for database maintenance is alleviated. PMID:18822113
An Offline-Online Android Application for Hazard Event Mapping Using WebGIS Open Source Technologies

NASA Astrophysics Data System (ADS)

Olyazadeh, Roya; Jaboyedoff, Michel; Sudmeier-Rieux, Karen; Derron, Marc-Henri; Devkota, Sanjaya

2016-04-01

Nowadays, Free and Open Source Software (FOSS) plays an important role in better understanding and managing disaster risk reduction around the world. National and local government, NGOs and other stakeholders are increasingly seeking and producing data on hazards. Most of the hazard event inventories and land use mapping are based on remote sensing data, with little ground truthing, creating difficulties depending on the terrain and accessibility. Open Source WebGIS tools offer an opportunity for quicker and easier ground truthing of critical areas in order to analyse hazard patterns and triggering factors. This study presents a secure mobile-map application for hazard event mapping using Open Source WebGIS technologies such as Postgres database, Postgis, Leaflet, Cordova and Phonegap. The objectives of this prototype are: 1. An Offline-Online android mobile application with advanced Geospatial visualisation; 2. Easy Collection and storage of events information applied services; 3. Centralized data storage with accessibility by all the service (smartphone, standard web browser); 4. Improving data management by using active participation in hazard event mapping and storage. This application has been implemented as a low-cost, rapid and participatory method for recording impacts from hazard events and includes geolocation (GPS data and Internet), visualizing maps with overlay of satellite images, viewing uploaded images and events as cluster points, drawing and adding event information. The data can be recorded in offline (Android device) or online version (all browsers) and consequently uploaded through the server whenever internet is available. All the events and records can be visualized by an administrator and made public after approval. Different user levels can be defined to access the data for communicating the information. This application was tested for landslides in post-earthquake Nepal but can be used for any other type of hazards such as flood, avalanche, etc. Keywords: Offline, Online, WebGIS Open source, Android, Hazard Event Mapping
The Application of the Open Pharmacological Concepts Triple Store (Open PHACTS) to Support Drug Discovery Research

PubMed Central

Ratnam, Joseline; Zdrazil, Barbara; Digles, Daniela; Cuadrado-Rodriguez, Emiliano; Neefs, Jean-Marc; Tipney, Hannah; Siebes, Ronald; Waagmeester, Andra; Bradley, Glyn; Chau, Chau Han; Richter, Lars; Brea, Jose; Evelo, Chris T.; Jacoby, Edgar; Senger, Stefan; Loza, Maria Isabel; Ecker, Gerhard F.; Chichester, Christine

2014-01-01

Integration of open access, curated, high-quality information from multiple disciplines in the Life and Biomedical Sciences provides a holistic understanding of the domain. Additionally, the effective linking of diverse data sources can unearth hidden relationships and guide potential research strategies. However, given the lack of consistency between descriptors and identifiers used in different resources and the absence of a simple mechanism to link them, gathering and combining relevant, comprehensive information from diverse databases remains a challenge. The Open Pharmacological Concepts Triple Store (Open PHACTS) is an Innovative Medicines Initiative project that uses semantic web technology approaches to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems. The project draws together sources of publicly-available pharmacological, physicochemical and biomolecular data, represents it in a stable infrastructure and provides well-defined information exploration and retrieval methods. Here, we highlight the utility of this platform in conjunction with workflow tools to solve pharmacological research questions that require interoperability between target, compound, and pathway data. Use cases presented herein cover 1) the comprehensive identification of chemical matter for a dopamine receptor drug discovery program 2) the identification of compounds active against all targets in the Epidermal growth factor receptor (ErbB) signaling pathway that have a relevance to disease and 3) the evaluation of established targets in the Vitamin D metabolism pathway to aid novel Vitamin D analogue design. The example workflows presented illustrate how the Open PHACTS Discovery Platform can be used to exploit existing knowledge and generate new hypotheses in the process of drug discovery. PMID:25522365

Intercomparison of Open-Path Trace Gas Measurements with Two Dual Frequency Comb Spectrometers

PubMed Central

Waxman, Eleanor M.; Cossel, Kevin C.; Truong, Gar-Wing; Giorgetta, Fabrizio R.; Swann, William C.; Coburn, Sean; Wright, Robert J.; Rieker, Gregory B.; Coddington, Ian; Newbury, Nathan R.

2017-01-01

We present the first quantitative intercomparison between two open-path dual comb spectroscopy (DCS) instruments which were operated across adjacent 2-km open-air paths over a two-week period. We used DCS to measure the atmospheric absorption spectrum in the near infrared from 6021 to 6388 cm−1 (1565 to 1661 nm), corresponding to a 367 cm−1 bandwidth, at 0.0067 cm−1 sample spacing. The measured absorption spectra agree with each other to within 5×10−4 without any external calibration of either instrument. The absorption spectra are fit to retrieve concentrations for carbon dioxide (CO2), methane (CH4), water (H2O), and deuterated water (HDO). The retrieved dry mole fractions agree to 0.14% (0.57 ppm) for CO2, 0.35% (7 ppb) for CH4, and 0.40% (36 ppm) for H2O over the two-week measurement campaign, which included 23 °C outdoor temperature variations and periods of strong atmospheric turbulence. This agreement is at least an order of magnitude better than conventional active-source open-path instrument intercomparisons and is particularly relevant to future regional flux measurements as it allows accurate comparisons of open-path DCS data across locations and time. We additionally compare the open-path DCS retrievals to a WMO-calibrated cavity ringdown point sensor located along the path with good agreement. Short-term and long-term differences between the two systems are attributed, respectively, to spatial sampling discrepancies and to inaccuracies in the current spectral database used to fit the DCS data. Finally, the two-week measurement campaign yields diurnal cycles of CO2 and CH4 that are consistent with the presence of local sources of CO2 and absence of local sources of CH4. PMID:29276547
OPEN-SOURCE SOFTWARE IN DENTISTRY: A SYSTEMATIC REVIEW.

PubMed

Chruściel-Nogalska, Małgorzata; Smektała, Tomasz; Tutak, Marcin; Sporniak-Tutak, Katarzyna; Olszewski, Raphael

2017-01-01

Technological development and the need for electronic health records management resulted in the need for a computer with dedicated, commercial software in daily dental practice. The alternative for commercial software may be open-source solutions. Therefore, this study reviewed the current literature on the availability and use of open-source software (OSS) in dentistry. A comprehensive database search was performed on February 1, 2017. Only articles published in peer-reviewed journals with a focus on the use or description of OSS were retrieved. The level of evidence, according to Oxford EBM Centre Levels of Evidence Scale was classified for all studies. Experimental studies underwent additional quality reporting assessment. The screening and evaluation process resulted in twenty-one studies from 1,940 articles found, with 10 of them being experimental studies. None of the articles provided level 1 evidence, and only one study was considered high quality following quality assessment. Twenty-six different OSS programs were described in the included studies of which ten were used for image visualization, five were used for healthcare records management, four were used for educations processes, one was used for remote consultation and simulation, and six were used for general purposes. Our analysis revealed that the dental literature on OSS consists of scarce, incomplete, and methodologically low quality information.
Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination.

PubMed

Doiron, Dany; Marcon, Yannick; Fortier, Isabel; Burton, Paul; Ferretti, Vincent

2017-10-01

Improving the dissemination of information on existing epidemiological studies and facilitating the interoperability of study databases are essential to maximizing the use of resources and accelerating improvements in health. To address this, Maelstrom Research proposes Opal and Mica, two inter-operable open-source software packages providing out-of-the-box solutions for epidemiological data management, harmonization and dissemination. Opal and Mica are two standalone but inter-operable web applications written in Java, JavaScript and PHP. They provide web services and modern user interfaces to access them. Opal allows users to import, manage, annotate and harmonize study data. Mica is used to build searchable web portals disseminating study and variable metadata. When used conjointly, Mica users can securely query and retrieve summary statistics on geographically dispersed Opal servers in real-time. Integration with the DataSHIELD approach allows conducting more complex federated analyses involving statistical models. Opal and Mica are open-source and freely available at [www.obiba.org] under a General Public License (GPL) version 3, and the metadata models and taxonomies that accompany them are available under a Creative Commons licence. © The Author 2017; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association
Akuna: An Open Source User Environment for Managing Subsurface Simulation Workflows

NASA Astrophysics Data System (ADS)

Freedman, V. L.; Agarwal, D.; Bensema, K.; Finsterle, S.; Gable, C. W.; Keating, E. H.; Krishnan, H.; Lansing, C.; Moeglein, W.; Pau, G. S. H.; Porter, E.; Scheibe, T. D.

2014-12-01

The U.S. Department of Energy (DOE) is investing in development of a numerical modeling toolset called ASCEM (Advanced Simulation Capability for Environmental Management) to support modeling analyses at legacy waste sites. ASCEM is an open source and modular computing framework that incorporates new advances and tools for predicting contaminant fate and transport in natural and engineered systems. The ASCEM toolset includes both a Platform with Integrated Toolsets (called Akuna) and a High-Performance Computing multi-process simulator (called Amanzi). The focus of this presentation is on Akuna, an open-source user environment that manages subsurface simulation workflows and associated data and metadata. In this presentation, key elements of Akuna are demonstrated, which includes toolsets for model setup, database management, sensitivity analysis, parameter estimation, uncertainty quantification, and visualization of both model setup and simulation results. A key component of the workflow is in the automated job launching and monitoring capabilities, which allow a user to submit and monitor simulation runs on high-performance, parallel computers. Visualization of large outputs can also be performed without moving data back to local resources. These capabilities make high-performance computing accessible to the users who might not be familiar with batch queue systems and usage protocols on different supercomputers and clusters.
Open source software projects of the caBIG In Vivo Imaging Workspace Software special interest group.

PubMed

Prior, Fred W; Erickson, Bradley J; Tarbox, Lawrence

2007-11-01

The Cancer Bioinformatics Grid (caBIG) program was created by the National Cancer Institute to facilitate sharing of IT infrastructure, data, and applications among the National Cancer Institute-sponsored cancer research centers. The program was launched in February 2004 and now links more than 50 cancer centers. In April 2005, the In Vivo Imaging Workspace was added to promote the use of imaging in cancer clinical trials. At the inaugural meeting, four special interest groups (SIGs) were established. The Software SIG was charged with identifying projects that focus on open-source software for image visualization and analysis. To date, two projects have been defined by the Software SIG. The eXtensible Imaging Platform project has produced a rapid application development environment that researchers may use to create targeted workflows customized for specific research projects. The Algorithm Validation Tools project will provide a set of tools and data structures that will be used to capture measurement information and associated needed to allow a gold standard to be defined for the given database against which change analysis algorithms can be tested. Through these and future efforts, the caBIG In Vivo Imaging Workspace Software SIG endeavors to advance imaging informatics and provide new open-source software tools to advance cancer research.
Transparent Global Seismic Hazard and Risk Assessment

NASA Astrophysics Data System (ADS)

Smolka, Anselm; Schneider, John; Pinho, Rui; Crowley, Helen

2013-04-01

Vulnerability to earthquakes is increasing, yet advanced reliable risk assessment tools and data are inaccessible to most, despite being a critical basis for managing risk. Also, there are few, if any, global standards that allow us to compare risk between various locations. The Global Earthquake Model (GEM) is a unique collaborative effort that aims to provide organizations and individuals with tools and resources for transparent assessment of earthquake risk anywhere in the world. By pooling data, knowledge and people, GEM acts as an international forum for collaboration and exchange, and leverages the knowledge of leading experts for the benefit of society. Sharing of data and risk information, best practices, and approaches across the globe is key to assessing risk more effectively. Through global projects, open-source IT development and collaborations with more than 10 regions, leading experts are collaboratively developing unique global datasets, best practice, open tools and models for seismic hazard and risk assessment. Guided by the needs and experiences of governments, companies and citizens at large, they work in continuous interaction with the wider community. A continuously expanding public-private partnership constitutes the GEM Foundation, which drives the collaborative GEM effort. An integrated and holistic approach to risk is key to GEM's risk assessment platform, OpenQuake, that integrates all above-mentioned contributions and will become available towards the end of 2014. Stakeholders worldwide will be able to calculate, visualise and investigate earthquake risk, capture new data and to share their findings for joint learning. Homogenized information on hazard can be combined with data on exposure (buildings, population) and data on their vulnerability, for loss assessment around the globe. Furthermore, for a true integrated view of seismic risk, users can add social vulnerability and resilience indices to maps and estimate the costs and benefits of different risk management measures. The following global data, models and methodologies will be available in the platform. Some of these will be released to the public already before, such as the ISC-GEM global instrumental catalogue (released January 2013). Datasets: • Global Earthquake History Catalogue [1000-1903] • Global Instrumental Catalogue [1900-2009] • Global Geodetic Strain Rate Model • Global Active Fault Database • Tectonic Regionalisation • Buildings and Population Database • Earthquake Consequences Database • Physical Vulnerability Database • Socio-Economic Vulnerability and Resilience Indicators Models: • Seismic Source Models • Ground Motion (Attenuation) Models • Physical Exposure Models • Physical Vulnerability Models • Composite Index Models (social vulnerability, resilience, indirect loss) The aforementioned models developed under the GEM framework will be combined to produce estimates of hazard and risk at a global scale. Furthermore, building on many ongoing efforts and knowledge of scientists worldwide, GEM will integrate state-of-the-art data, models, results and open-source tools into a single platform that is to serve as a "clearinghouse" on seismic risk. The platform will continue to increase in value, in particular for use in local contexts, through contributions and collaborations with scientists and organisations worldwide.
Creating Access to Data of Worldwide Volcanic Unrest

NASA Astrophysics Data System (ADS)

Venezky, D. Y.; Newhall, C. G.; Malone, S. D.

2003-12-01

We are creating a pilot database (WOVOdat - the World Organization of Volcano Observatories database) using an open source database and content generation software, allowing web access to data of worldwide volcanic seismicity, ground deformation, fumarolic activity, and other changes within or adjacent to a volcanic system. After three years of discussions with volcano observatories of the WOVO community and institutional databases such as IRIS, UNAVCO, and the Smithsonian's Global Volcanism Program about how to link global data of volcanic unrest for use during crisis situations and for research, we are now developing the pilot database. We already have created the core tables and have written simple queries that access some of the available data using pull-down menus on a website. Over the next year, we plan to complete schema realization, expand querying capabilities, and then open the pilot database for a multi-year data-loading process. Many of the challenges we are encountering are common to multidisciplinary projects and include determining standard data formats, choosing levels of data detail (raw vs. minimally processed data, summary intervals vs. continuous data, etc.), and organizing the extant but variable data into a useable schema. Additionally, we are working on how best to enter the varied data into the database (scripts for digital data and web-entry tools for non-digital data) and what standard sets of queries are most important. An essential during an evolving volcanic crisis would be: `Has any volcano shown the behavior being observed here and what happened?'. We believe that with a systematic aggregation of all datasets on volcanic unrest, we should be able to find patterns that were previously inaccessible or unrecognized. The second WOVOdat workshop in 2002 provided a recent forum for discussion of data formats, database access, and schemas. The formats and units for the discussed parameters can be viewed at http://www.wovo.org/WOVOdat/parameters.htm. Comments, suggestions, and participation in all aspects of the WOVOdat project are welcome and appreciated.
Multicriteria analysis using open-source data and software for the implementation of a centralized biomedical waste management system in a developing country (Guinea, Conakry).

NASA Astrophysics Data System (ADS)

Pérez Peña, José Vicente; Baldó, Mane; Acosta, Yarci; Verschueren, Laurent; Thibaud, Kenmognie; Bilivogui, Pépé; Jean-Paul Ngandu, Alain; Beavogui, Maoro

2017-04-01

In the last decade the increasing interest for public health has promoted specific regulations for the transport, storage, transformation and/or elimination of potentially toxic waste. A special concern should focus on the effective management of biomedical waste, due to the environmental and health risk associated with them. The first stage for the effective management these waste includes the selection of the best sites for the location of facilities for its storage and/or elimination. Best-site selection is accomplished by means of multi-criteria decision analyses (MCDA) that aim to minimize the social and environmental impact, and to maximize management efficiency. In this work we presented a methodology that uses open-source software and data to analyze the best location for the implantation of a centralized waste management system in a developing country (Guinea, Conakry). We applied an analytical hierarchy process (AHP) using different thematic layers such as land use (derived from up-to-date Sentinel 2 remote sensing images), soil type, distance and type of roads, hydrography, distance to dense populated areas, etc. Land-use data were derived from up-to-date Sentinel 2 remote sensing images, whereas roads and hydrography were obtained from the Open Street Map database and latter validated with administrative data. We performed the AHP analysis with the aid of QGIS open-software Geospatial Information System. This methodology is very effective for developing countries as it uses open-source software and data for the MCDA analysis, thus reducing costs in these first stages of the integrated analysis.
Data harmonization and federated analysis of population-based studies: the BioSHaRE project

PubMed Central

2013-01-01

Abstracts Background Individual-level data pooling of large population-based studies across research centres in international research projects faces many hurdles. The BioSHaRE (Biobank Standardisation and Harmonisation for Research Excellence in the European Union) project aims to address these issues by building a collaborative group of investigators and developing tools for data harmonization, database integration and federated data analyses. Methods Eight population-based studies in six European countries were recruited to participate in the BioSHaRE project. Through workshops, teleconferences and electronic communications, participating investigators identified a set of 96 variables targeted for harmonization to answer research questions of interest. Using each study’s questionnaires, standard operating procedures, and data dictionaries, harmonization potential was assessed. Whenever harmonization was deemed possible, processing algorithms were developed and implemented in an open-source software infrastructure to transform study-specific data into the target (i.e. harmonized) format. Harmonized datasets located on server in each research centres across Europe were interconnected through a federated database system to perform statistical analysis. Results Retrospective harmonization led to the generation of common format variables for 73% of matches considered (96 targeted variables across 8 studies). Authenticated investigators can now perform complex statistical analyses of harmonized datasets stored on distributed servers without actually sharing individual-level data using the DataSHIELD method. Conclusion New Internet-based networking technologies and database management systems are providing the means to support collaborative, multi-center research in an efficient and secure manner. The results from this pilot project show that, given a strong collaborative relationship between participating studies, it is possible to seamlessly co-analyse internationally harmonized research databases while allowing each study to retain full control over individual-level data. We encourage additional collaborative research networks in epidemiology, public health, and the social sciences to make use of the open source tools presented herein. PMID:24257327
Ensembl 2004.

PubMed

Birney, E; Andrews, D; Bevan, P; Caccamo, M; Cameron, G; Chen, Y; Clarke, L; Coates, G; Cox, T; Cuff, J; Curwen, V; Cutts, T; Down, T; Durbin, R; Eyras, E; Fernandez-Suarez, X M; Gane, P; Gibbins, B; Gilbert, J; Hammond, M; Hotz, H; Iyer, V; Kahari, A; Jekosch, K; Kasprzyk, A; Keefe, D; Keenan, S; Lehvaslaiho, H; McVicker, G; Melsopp, C; Meidl, P; Mongin, E; Pettett, R; Potter, S; Proctor, G; Rae, M; Searle, S; Slater, G; Smedley, D; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Storey, R; Ureta-Vidal, A; Woodwark, C; Clamp, M; Hubbard, T

2004-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organize biology around the sequences of large genomes. It is a comprehensive and integrated source of annotation of large genome sequences, available via interactive website, web services or flat files. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. The facilities of the system range from sequence analysis to data storage and visualization and installations exist around the world both in companies and at academic sites. With a total of nine genome sequences available from Ensembl and more genomes to follow, recent developments have focused mainly on closer integration between genomes and external data.
Cyclone: java-based querying and computing with Pathway/Genome databases.

PubMed

Le Fèvre, François; Smidtas, Serge; Schächter, Vincent

2007-05-15

Cyclone aims at facilitating the use of BioCyc, a collection of Pathway/Genome Databases (PGDBs). Cyclone provides a fully extensible Java Object API to analyze and visualize these data. Cyclone can read and write PGDBs, and can write its own data in the CycloneML format. This format is automatically generated from the BioCyc ontology by Cyclone itself, ensuring continued compatibility. Cyclone objects can also be stored in a relational database CycloneDB. Queries can be written in SQL, and in an intuitive and concise object-oriented query language, Hibernate Query Language (HQL). In addition, Cyclone interfaces easily with Java software including the Eclipse IDE for HQL edition, the Jung API for graph algorithms or Cytoscape for graph visualization. Cyclone is freely available under an open source license at: http://sourceforge.net/projects/nemo-cyclone. For download and installation instructions, tutorials, use cases and examples, see http://nemo-cyclone.sourceforge.net.
Catalogue of HI PArameters (CHIPA)

NASA Astrophysics Data System (ADS)

Saponara, J.; Benaglia, P.; Koribalski, B.; Andruchow, I.

2015-08-01

The catalogue of HI parameters of galaxies HI (CHIPA) is the natural continuation of the compilation by M.C. Martin in 1998. CHIPA provides the most important parameters of nearby galaxies derived from observations of the neutral Hydrogen line. The catalogue contains information of 1400 galaxies across the sky and different morphological types. Parameters like the optical diameter of the galaxy, the blue magnitude, the distance, morphological type, HI extension are listed among others. Maps of the HI distribution, velocity and velocity dispersion can also be display for some cases. The main objective of this catalogue is to facilitate the bibliographic queries, through searching in a database accessible from the internet that will be available in 2015 (the website is under construction). The database was built using the open source `` mysql (SQL, Structured Query Language, management system relational database) '', while the website was built with ''HTML (Hypertext Markup Language)'' and ''PHP (Hypertext Preprocessor)''.
Whose genome is it, anyway?

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marshall, E.

1996-09-27

The genome program has issued guidelines to ensure that sequencing is done on DNA from diverse sources who have given informed consent and are anonymous. Most current sources don`t meet those criteria. It may be the first question every nonexpert asks on learning about the Human Genome Project: Whose genome are we studying, anyway? It sounds naive, says one government scientist-so naive, in fact, that {open_quotes}we chuckle as we explain that we aren`t sequencing anyone`s genome in particular; we`re sequencing a representative genome{close_quotes} made up of a mosaic of DNA from a variety of anonymous sources. And Bruce Birren, amore » clone-maker now at the Massachusetts Institute of Technology`s (MIT`s) Whitehead Center for Genome Research says: {open_quotes}We spent many years pooh-poohing the question{close_quotes} of whose genome would be stored in the database. But now that labs have begun working on large stretches of human DNA-aiming to identify all 3 billion base pairs in the genetic code-the question no longer seems to laughable. To the distress of program managers in Bethesda, Maryland, the initial sources of DNA are not as diverse or as anonymous as they had assumed.« less
Demonstrating the Open Data Repository's Data Publisher: The CheMin Database

NASA Astrophysics Data System (ADS)

Stone, N.; Lafuente, B.; Bristow, T.; Pires, A.; Keller, R. M.; Downs, R. T.; Blake, D.; Dateo, C. E.; Fonda, M.

2018-04-01

The Open Data Repository's Data Publisher aims to provide an easy-to-use software tool that will allow researchers to create and publish database templates and related data. The CheMin Database developed using this framework is shown as an example.
PropeR revisited.

PubMed

van der Linden, Helma; Talmon, Jan; Tange, Huibert; Grimson, Jane; Hasman, Arie

2005-03-01

The PropeR EHR system (PropeRWeb) is a multidisciplinary electronic health record (EHR) system for multidisciplinary use in extramural patient care for stroke patients. The system is built using existing open source components and is based on open standards. It is implemented as a web application using servlets and Java Server Pages (JSP's) with a CORBA connection to the database servers, which are based on the OMG HDTF specifications. PropeRWeb is a generic system which can be readily customized for use in a variety of clinical domains. The system proved to be stable and flexible, although some aspects (a.o. user friendliness) could be improved. These improvements are currently under development in a second version.
HEPData: a repository for high energy physics data

NASA Astrophysics Data System (ADS)

Maguire, Eamonn; Heinrich, Lukas; Watt, Graeme

2017-10-01

The Durham High Energy Physics Database (HEPData) has been built up over the past four decades as a unique open-access repository for scattering data from experimental particle physics papers. It comprises data points underlying several thousand publications. Over the last two years, the HEPData software has been completely rewritten using modern computing technologies as an overlay on the Invenio v3 digital library framework. The software is open source with the new site available at https://hepdata.net now replacing the previous site at http://hepdata.cedar.ac.uk. In this write-up, we describe the development of the new site and explain some of the advantages it offers over the previous platform.
Brunn: an open source laboratory information system for microplates with a graphical plate layout design process.

PubMed

Alvarsson, Jonathan; Andersson, Claes; Spjuth, Ola; Larsson, Rolf; Wikberg, Jarl E S

2011-05-20

Compound profiling and drug screening generates large amounts of data and is generally based on microplate assays. Current information systems used for handling this are mainly commercial, closed source, expensive, and heavyweight and there is a need for a flexible lightweight open system for handling plate design, and validation and preparation of data. A Bioclipse plugin consisting of a client part and a relational database was constructed. A multiple-step plate layout point-and-click interface was implemented inside Bioclipse. The system contains a data validation step, where outliers can be removed, and finally a plate report with all relevant calculated data, including dose-response curves. Brunn is capable of handling the data from microplate assays. It can create dose-response curves and calculate IC50 values. Using a system of this sort facilitates work in the laboratory. Being able to reuse already constructed plates and plate layouts by starting out from an earlier step in the plate layout design process saves time and cuts down on error sources.
An overview on integrated data system for archiving and sharing marine geology and geophysical data in Korea Institute of Ocean Science & Technology (KIOST)

NASA Astrophysics Data System (ADS)

Choi, Sang-Hwa; Kim, Sung Dae; Park, Hyuk Min; Lee, SeungHa

2016-04-01

We established and have operated an integrated data system for managing, archiving and sharing marine geology and geophysical data around Korea produced from various research projects and programs in Korea Institute of Ocean Science & Technology (KIOST). First of all, to keep the consistency of data system with continuous data updates, we set up standard operating procedures (SOPs) for data archiving, data processing and converting, data quality controls, and data uploading, DB maintenance, etc. Database of this system comprises two databases, ARCHIVE DB and GIS DB for the purpose of this data system. ARCHIVE DB stores archived data as an original forms and formats from data providers for data archive and GIS DB manages all other compilation, processed and reproduction data and information for data services and GIS application services. Relational data management system, Oracle 11g, adopted for DBMS and open source GIS techniques applied for GIS services such as OpenLayers for user interface, GeoServer for application server, PostGIS and PostgreSQL for GIS database. For the sake of convenient use of geophysical data in a SEG Y format, a viewer program was developed and embedded in this system. Users can search data through GIS user interface and save the results as a report.
Revitalizing the drug pipeline: AntibioticDB, an open access database to aid antibacterial research and development.

PubMed

Farrell, L J; Lo, R; Wanford, J J; Jenkins, A; Maxwell, A; Piddock, L J V

2018-06-11

The current state of antibiotic discovery, research and development is insufficient to respond to the need for new treatments for drug-resistant bacterial infections. The process has changed over the last decade, with most new agents that are in Phases 1-3, or recently approved, having been discovered in small- and medium-sized enterprises or academia. These agents have then been licensed or sold to large companies for further development with the goal of taking them to market. However, early drug discovery and development, including the possibility of developing previously discontinued agents, would benefit from a database of antibacterial compounds for scrutiny by the developers. This article describes the first free, open-access searchable database of antibacterial compounds, including discontinued agents, drugs under pre-clinical development and those in clinical trials: AntibioticDB (AntibioticDB.com). Data were obtained from publicly available sources. This article summarizes the compounds and drugs in AntibioticDB, including their drug class, mode of action, development status and propensity to select drug-resistant bacteria. AntibioticDB includes compounds currently in pre-clinical development and 834 that have been discontinued and that reached varying stages of development. These may serve as starting points for future research and development.
The BioExtract Server: a web-based bioinformatic workflow platform

PubMed Central

Lushbough, Carol M.; Jennewein, Douglas M.; Brendel, Volker P.

2011-01-01

The BioExtract Server (bioextract.org) is an open, web-based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatic workflows. Scientific workflows are created within the system by recording tasks performed by the user. These tasks may include querying multiple, distributed data sources, saving query results as searchable data extracts, and executing local and web-accessible analytic tools. The series of recorded tasks can then be saved as a reproducible, sharable workflow available for subsequent execution with the original or modified inputs and parameter settings. Integrated data resources include interfaces to the National Center for Biotechnology Information (NCBI) nucleotide and protein databases, the European Molecular Biology Laboratory (EMBL-Bank) non-redundant nucleotide database, the Universal Protein Resource (UniProt), and the UniProt Reference Clusters (UniRef) database. The system offers access to numerous preinstalled, curated analytic tools and also provides researchers with the option of selecting computational tools from a large list of web services including the European Molecular Biology Open Software Suite (EMBOSS), BioMoby, and the Kyoto Encyclopedia of Genes and Genomes (KEGG). The system further allows users to integrate local command line tools residing on their own computers through a client-side Java applet. PMID:21546552

An offline-online Web-GIS Android application for fast data acquisition of landslide hazard and risk

NASA Astrophysics Data System (ADS)

Olyazadeh, Roya; Sudmeier-Rieux, Karen; Jaboyedoff, Michel; Derron, Marc-Henri; Devkota, Sanjaya

2017-04-01

Regional landslide assessments and mapping have been effectively pursued by research institutions, national and local governments, non-governmental organizations (NGOs), and different stakeholders for some time, and a wide range of methodologies and technologies have consequently been proposed. Land-use mapping and hazard event inventories are mostly created by remote-sensing data, subject to difficulties, such as accessibility and terrain, which need to be overcome. Likewise, landslide data acquisition for the field navigation can magnify the accuracy of databases and analysis. Open-source Web and mobile GIS tools can be used for improved ground-truthing of critical areas to improve the analysis of hazard patterns and triggering factors. This paper reviews the implementation and selected results of a secure mobile-map application called ROOMA (Rapid Offline-Online Mapping Application) for the rapid data collection of landslide hazard and risk. This prototype assists the quick creation of landslide inventory maps (LIMs) by collecting information on the type, feature, volume, date, and patterns of landslides using open-source Web-GIS technologies such as Leaflet maps, Cordova, GeoServer, PostgreSQL as the real DBMS (database management system), and PostGIS as its plug-in for spatial database management. This application comprises Leaflet maps coupled with satellite images as a base layer, drawing tools, geolocation (using GPS and the Internet), photo mapping, and event clustering. All the features and information are recorded into a GeoJSON text file in an offline version (Android) and subsequently uploaded to the online mode (using all browsers) with the availability of Internet. Finally, the events can be accessed and edited after approval by an administrator and then be visualized by the general public.
Gunshot identification system by integration of open source consumer electronics

NASA Astrophysics Data System (ADS)

López R., Juan Manuel; Marulanda B., Jose Ignacio

2014-05-01

This work presents a prototype of low-cost gunshots identification system that uses consumer electronics in order to ensure the existence of gunshots and then classify it according to a previously established database. The implementation of this tool in the urban areas is to set records that support the forensics, hence improving law enforcement also on developing countries. An analysis of its effectiveness is presented in comparison with theoretical results obtained with numerical simulations.
Mining collections of compounds with Screening Assistant 2

PubMed Central

2012-01-01

Background High-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened. Results We present Screening Assistant 2 (SA2), an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering. We illustrate the use of SA2 by analysing the composition of a database of 15 million compounds collected from 73 providers, in terms of scaffolds, frameworks, and undesired properties as defined by recently proposed HTS SMARTS filters. We also show how the software can be used to create diverse libraries based on existing ones. Conclusions Screening Assistant 2 is a user-friendly, open-source software that can be used to manage collections of compounds and perform simple to advanced chemoinformatics analyses. Its modular design and growing documentation facilitate the addition of new functionalities, calling for contributions from the community. The software can be downloaded at http://sa2.sourceforge.net/. PMID:23327565
Mining collections of compounds with Screening Assistant 2.

PubMed

Guilloux, Vincent Le; Arrault, Alban; Colliandre, Lionel; Bourg, Stéphane; Vayer, Philippe; Morin-Allory, Luc

2012-08-31

High-throughput screening assays have become the starting point of many drug discovery programs for large pharmaceutical companies as well as academic organisations. Despite the increasing throughput of screening technologies, the almost infinite chemical space remains out of reach, calling for tools dedicated to the analysis and selection of the compound collections intended to be screened. We present Screening Assistant 2 (SA2), an open-source JAVA software dedicated to the storage and analysis of small to very large chemical libraries. SA2 stores unique molecules in a MySQL database, and encapsulates several chemoinformatics methods, among which: providers management, interactive visualisation, scaffold analysis, diverse subset creation, descriptors calculation, sub-structure / SMART search, similarity search and filtering. We illustrate the use of SA2 by analysing the composition of a database of 15 million compounds collected from 73 providers, in terms of scaffolds, frameworks, and undesired properties as defined by recently proposed HTS SMARTS filters. We also show how the software can be used to create diverse libraries based on existing ones. Screening Assistant 2 is a user-friendly, open-source software that can be used to manage collections of compounds and perform simple to advanced chemoinformatics analyses. Its modular design and growing documentation facilitate the addition of new functionalities, calling for contributions from the community. The software can be downloaded at http://sa2.sourceforge.net/.
Pharmacology Portal: An Open Database for Clinical Pharmacologic Laboratory Services.

PubMed

Karlsen Bjånes, Tormod; Mjåset Hjertø, Espen; Lønne, Lars; Aronsen, Lena; Andsnes Berg, Jon; Bergan, Stein; Otto Berg-Hansen, Grim; Bernard, Jean-Paul; Larsen Burns, Margrete; Toralf Fosen, Jan; Frost, Joachim; Hilberg, Thor; Krabseth, Hege-Merete; Kvan, Elena; Narum, Sigrid; Austgulen Westin, Andreas

2016-01-01

More than 50 Norwegian public and private laboratories provide one or more analyses for therapeutic drug monitoring or testing for drugs of abuse. Practices differ among laboratories, and analytical repertoires can change rapidly as new substances become available for analysis. The Pharmacology Portal was developed to provide an overview of these activities and to standardize the practices and terminology among laboratories. The Pharmacology Portal is a modern dynamic web database comprising all available analyses within therapeutic drug monitoring and testing for drugs of abuse in Norway. Content can be retrieved by using the search engine or by scrolling through substance lists. The core content is a substance registry updated by a national editorial board of experts within the field of clinical pharmacology. This ensures quality and consistency regarding substance terminologies and classification. All laboratories publish their own repertoires in a user-friendly workflow, adding laboratory-specific details to the core information in the substance registry. The user management system ensures that laboratories are restricted from editing content in the database core or in repertoires within other laboratory subpages. The portal is for nonprofit use, and has been fully funded by the Norwegian Medical Association, the Norwegian Society of Clinical Pharmacology, and the 8 largest pharmacologic institutions in Norway. The database server runs an open-source content management system that ensures flexibility with respect to further development projects, including the potential expansion of the Pharmacology Portal to other countries. Copyright © 2016 Elsevier HS Journals, Inc. All rights reserved.
Crystallography Open Databases and Preservation: a World-wide Initiative

NASA Astrophysics Data System (ADS)

Chateigner, Daniel

In 2003, an international team of crystallographers proposed the Crystallography Open Database (COD), a fully-free collection of crystal structure data, in the aim of ensuring their preservation. With nearly 250000 entries, this database represents a large open set of data for crystallographers, academics and industrials, located at five different places world-wide, and included in Thomson-Reuters’ ISI. As a large step towards data preservation, raw data can now be uploaded along with «digested» structure files, and COD can be questioned by most of the crystallography-linked industrial software. The COD initiative work deserves several other open developments.
Visualization of Vgi Data Through the New NASA Web World Wind Virtual Globe

NASA Astrophysics Data System (ADS)

Brovelli, M. A.; Kilsedar, C. E.; Zamboni, G.

2016-06-01

GeoWeb 2.0, laying the foundations of Volunteered Geographic Information (VGI) systems, has led to platforms where users can contribute to the geographic knowledge that is open to access. Moreover, as a result of the advancements in 3D visualization, virtual globes able to visualize geographic data even on browsers emerged. However the integration of VGI systems and virtual globes has not been fully realized. The study presented aims to visualize volunteered data in 3D, considering also the ease of use aspects for general public, using Free and Open Source Software (FOSS). The new Application Programming Interface (API) of NASA, Web World Wind, written in JavaScript and based on Web Graphics Library (WebGL) is cross-platform and cross-browser, so that the virtual globe created using this API can be accessible through any WebGL supported browser on different operating systems and devices, as a result not requiring any installation or configuration on the client-side, making the collected data more usable to users, which is not the case with the World Wind for Java as installation and configuration of the Java Virtual Machine (JVM) is required. Furthermore, the data collected through various VGI platforms might be in different formats, stored in a traditional relational database or in a NoSQL database. The project developed aims to visualize and query data collected through Open Data Kit (ODK) platform and a cross-platform application, where data is stored in a relational PostgreSQL and NoSQL CouchDB databases respectively.
ExpressionDB: An open source platform for distributing genome-scale datasets.

PubMed

Hughes, Laura D; Lewis, Scott A; Hughes, Michael E

2017-01-01

RNA-sequencing (RNA-seq) and microarrays are methods for measuring gene expression across the entire transcriptome. Recent advances have made these techniques practical and affordable for essentially any laboratory with experience in molecular biology. A variety of computational methods have been developed to decrease the amount of bioinformatics expertise necessary to analyze these data. Nevertheless, many barriers persist which discourage new labs from using functional genomics approaches. Since high-quality gene expression studies have enduring value as resources to the entire research community, it is of particular importance that small labs have the capacity to share their analyzed datasets with the research community. Here we introduce ExpressionDB, an open source platform for visualizing RNA-seq and microarray data accommodating virtually any number of different samples. ExpressionDB is based on Shiny, a customizable web application which allows data sharing locally and online with customizable code written in R. ExpressionDB allows intuitive searches based on gene symbols, descriptions, or gene ontology terms, and it includes tools for dynamically filtering results based on expression level, fold change, and false-discovery rates. Built-in visualization tools include heatmaps, volcano plots, and principal component analysis, ensuring streamlined and consistent visualization to all users. All of the scripts for building an ExpressionDB with user-supplied data are freely available on GitHub, and the Creative Commons license allows fully open customization by end-users. We estimate that a demo database can be created in under one hour with minimal programming experience, and that a new database with user-supplied expression data can be completed and online in less than one day.
WebCN: A web-based computation tool for in situ-produced cosmogenic nuclides

NASA Astrophysics Data System (ADS)

Ma, Xiuzeng; Li, Yingkui; Bourgeois, Mike; Caffee, Marc; Elmore, David; Granger, Darryl; Muzikar, Paul; Smith, Preston

2007-06-01

Cosmogenic nuclide techniques are increasingly being utilized in geoscience research. For this it is critical to establish an effective, easily accessible and well defined tool for cosmogenic nuclide computations. We have been developing a web-based tool (WebCN) to calculate surface exposure ages and erosion rates based on the nuclide concentrations measured by the accelerator mass spectrometry. WebCN for 10Be and 26Al has been finished and published at http://www.physics.purdue.edu/primelab/for_users/rockage.html. WebCN for 36Cl is under construction. WebCN is designed as a three-tier client/server model and uses the open source PostgreSQL for the database management and PHP for the interface design and calculations. On the client side, an internet browser and Microsoft Access are used as application interfaces to access the system. Open Database Connectivity is used to link PostgreSQL and Microsoft Access. WebCN accounts for both spatial and temporal distributions of the cosmic ray flux to calculate the production rates of in situ-produced cosmogenic nuclides at the Earth's surface.
Use of the initial trauma CT scan to aid in diagnosis of open pelvic fractures.

PubMed

Scolaro, John A; Wilson, David J; Routt, Milton Lee Chip; Firoozabadi, Reza

2015-10-01

Open pelvic disruptions represent high-energy injuries. The prompt identification and management of these injuries decreases their associated morbidity and mortality. Computed tomography (CT) scans are routinely obtained in the initial evaluation of patients with pelvic injuries. The purpose of this study is to identify the incidence and source of air densities noted on computed tomography (CT) scans of the abdominal and pelvic region in patients with pelvic fractures and evaluate the use of initial CT imaging as an adjunctive diagnostic tool to identify open injuries. A retrospective review of a prospectively collected database was performed at a single institution. Seven hundred and twenty-two consecutive patients with a pelvic disruption over a two-year period were included. Review of initial injury CT scans was performed using bone and lung viewing algorithms to identify the presence of extra-luminal air. The primary outcome was the presence, location and source of air identified on pre-operative CT scans. Secondary measurements were identification of air by plain radiograph and correlation between identified air densities on CT and clinically diagnosed open pelvic fractures. Ninety-eight patients were identified as having extra-luminal air densities on CT scans. Eighty-one patients were included in the final analysis following application of inclusion and exclusion criteria. Air was noted by the radiologist in forty-five (55.6%) instances. Six patients (7.4%) were clinically diagnosed with an open pelvic ring disruption; in two patients (2.4%) this diagnosis was delayed. In all patients, the CT was able to track air from its origin. In patients with pelvic disruptions, the injury CT should also be evaluated for the presence and source of extra-luminal air. In some patients, this finding may represent an open pelvic ring disruption. A complete physical exam and CT evaluation should be used to decrease the missed or delayed diagnosis of an open pelvic ring injury. Copyright © 2015 Elsevier Ltd. All rights reserved.
Overcoming Species Boundaries in Peptide Identification with Bayesian Information Criterion-driven Error-tolerant Peptide Search (BICEPS)*

PubMed Central

Renard, Bernhard Y.; Xu, Buote; Kirchner, Marc; Zickmann, Franziska; Winter, Dominic; Korten, Simone; Brattig, Norbert W.; Tzur, Amit; Hamprecht, Fred A.; Steen, Hanno

2012-01-01

Currently, the reliable identification of peptides and proteins is only feasible when thoroughly annotated sequence databases are available. Although sequencing capacities continue to grow, many organisms remain without reliable, fully annotated reference genomes required for proteomic analyses. Standard database search algorithms fail to identify peptides that are not exactly contained in a protein database. De novo searches are generally hindered by their restricted reliability, and current error-tolerant search strategies are limited by global, heuristic tradeoffs between database and spectral information. We propose a Bayesian information criterion-driven error-tolerant peptide search (BICEPS) and offer an open source implementation based on this statistical criterion to automatically balance the information of each single spectrum and the database, while limiting the run time. We show that BICEPS performs as well as current database search algorithms when such algorithms are applied to sequenced organisms, whereas BICEPS only uses a remotely related organism database. For instance, we use a chicken instead of a human database corresponding to an evolutionary distance of more than 300 million years (International Chicken Genome Sequencing Consortium (2004) Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432, 695–716). We demonstrate the successful application to cross-species proteomics with a 33% increase in the number of identified proteins for a filarial nematode sample of Litomosoides sigmodontis. PMID:22493179
Application of China's National Forest Continuous Inventory database.

PubMed

Xie, Xiaokui; Wang, Qingli; Dai, Limin; Su, Dongkai; Wang, Xinchuang; Qi, Guang; Ye, Yujing

2011-12-01

The maintenance of a timely, reliable and accurate spatial database on current forest ecosystem conditions and changes is essential to characterize and assess forest resources and support sustainable forest management. Information for such a database can be obtained only through a continuous forest inventory. The National Forest Continuous Inventory (NFCI) is the first level of China's three-tiered inventory system. The NFCI is administered by the State Forestry Administration; data are acquired by five inventory institutions around the country. Several important components of the database include land type, forest classification and ageclass/ age-group. The NFCI database in China is constructed based on 5-year inventory periods, resulting in some of the data not being timely when reports are issued. To address this problem, a forest growth simulation model has been developed to update the database for years between the periodic inventories. In order to aid in forest plan design and management, a three-dimensional virtual reality system of forest landscapes for selected units in the database (compartment or sub-compartment) has also been developed based on Virtual Reality Modeling Language. In addition, a transparent internet publishing system for a spatial database based on open source WebGIS (UMN Map Server) has been designed and utilized to enhance public understanding and encourage free participation of interested parties in the development, implementation, and planning of sustainable forest management.
The Protein Information Resource: an integrated public resource of functional annotation of proteins

PubMed Central

Wu, Cathy H.; Huang, Hongzhan; Arminski, Leslie; Castro-Alvear, Jorge; Chen, Yongxing; Hu, Zhang-Zhi; Ledley, Robert S.; Lewis, Kali C.; Mewes, Hans-Werner; Orcutt, Bruce C.; Suzek, Baris E.; Tsugita, Akira; Vinayaka, C. R.; Yeh, Lai-Su L.; Zhang, Jian; Barker, Winona C.

2002-01-01

The Protein Information Resource (PIR) serves as an integrated public resource of functional annotation of protein data to support genomic/proteomic research and scientific discovery. The PIR, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the PIR-International Protein Sequence Database (PSD), the major annotated protein sequence database in the public domain, containing about 250 000 proteins. To improve protein annotation and the coverage of experimentally validated data, a bibliography submission system is developed for scientists to submit, categorize and retrieve literature information. Comprehensive protein information is available from iProClass, which includes family classification at the superfamily, domain and motif levels, structural and functional features of proteins, as well as cross-references to over 40 biological databases. To provide timely and comprehensive protein data with source attribution, we have introduced a non-redundant reference protein database, PIR-NREF. The database consists of about 800 000 proteins collected from PIR-PSD, SWISS-PROT, TrEMBL, GenPept, RefSeq and PDB, with composite protein names and literature data. To promote database interoperability, we provide XML data distribution and open database schema, and adopt common ontologies. The PIR web site (http://pir.georgetown.edu/) features data mining and sequence analysis tools for information retrieval and functional identification of proteins based on both sequence and annotation information. The PIR databases and other files are also available by FTP (ftp://nbrfa.georgetown.edu/pir_databases). PMID:11752247
Public Access to Digital Material; A Call to Researchers: Digital Libraries Need Collaboration across Disciplines; Greenstone: Open-Source Digital Library Software; Retrieval Issues for the Colorado Digitization Project's Heritage Database; Report on the 5th European Conference on Digital Libraries, ECDL 2001; Report on the First Joint Conference on Digital Libraries.

ERIC Educational Resources Information Center

Kahle, Brewster; Prelinger, Rick; Jackson, Mary E.; Boyack, Kevin W.; Wylie, Brian N.; Davidson, George S.; Witten, Ian H.; Bainbridge, David; Boddie, Stefan J.; Garrison, William A.; Cunningham, Sally Jo; Borgman, Christine L.; Hessel, Heather

2001-01-01

These six articles discuss various issues relating to digital libraries. Highlights include public access to digital materials; intellectual property concerns; the need for collaboration across disciplines; Greenstone software for construction and presentation of digital information collections; the Colorado Digitization Project; and conferences…
Trends in the Evolution of the Public Web, 1998-2002; The Fedora Project: An Open-source Digital Object Repository Management System; State of the Dublin Core Metadata Initiative, April 2003; Preservation Metadata; How Many People Search the ERIC Database Each Day?

ERIC Educational Resources Information Center

O'Neill, Edward T.; Lavoie, Brian F.; Bennett, Rick; Staples, Thornton; Wayland, Ross; Payette, Sandra; Dekkers, Makx; Weibel, Stuart; Searle, Sam; Thompson, Dave; Rudner, Lawrence M.

2003-01-01

Includes five articles that examine key trends in the development of the public Web: size and growth, internationalization, and metadata usage; Flexible Extensible Digital Object and Repository Architecture (Fedora) for use in digital libraries; developments in the Dublin Core Metadata Initiative (DCMI); the National Library of New Zealand Te Puna…
Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases.

PubMed

Sanderson, Lacey-Anne; Ficklin, Stephen P; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A; Bett, Kirstin E; Main, Dorrie

2013-01-01

Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including 'Feature Map', 'Genetic', 'Publication', 'Project', 'Contact' and the 'Natural Diversity' modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. DATABASE URL: http://tripal.info/.
Tripal v1.1: a standards-based toolkit for construction of online genetic and genomic databases

PubMed Central

Sanderson, Lacey-Anne; Ficklin, Stephen P.; Cheng, Chun-Huai; Jung, Sook; Feltus, Frank A.; Bett, Kirstin E.; Main, Dorrie

2013-01-01

Tripal is an open-source freely available toolkit for construction of online genomic and genetic databases. It aims to facilitate development of community-driven biological websites by integrating the GMOD Chado database schema with Drupal, a popular website creation and content management software. Tripal provides a suite of tools for interaction with a Chado database and display of content therein. The tools are designed to be generic to support the various ways in which data may be stored in Chado. Previous releases of Tripal have supported organisms, genomic libraries, biological stocks, stock collections and genomic features, their alignments and annotations. Also, Tripal and its extension modules provided loaders for commonly used file formats such as FASTA, GFF, OBO, GAF, BLAST XML, KEGG heir files and InterProScan XML. Default generic templates were provided for common views of biological data, which could be customized using an open Application Programming Interface to change the way data are displayed. Here, we report additional tools and functionality that are part of release v1.1 of Tripal. These include (i) a new bulk loader that allows a site curator to import data stored in a custom tab delimited format; (ii) full support of every Chado table for Drupal Views (a powerful tool allowing site developers to construct novel displays and search pages); (iii) new modules including ‘Feature Map’, ‘Genetic’, ‘Publication’, ‘Project’, ‘Contact’ and the ‘Natural Diversity’ modules. Tutorials, mailing lists, download and set-up instructions, extension modules and other documentation can be found at the Tripal website located at http://tripal.info. Database URL: http://tripal.info/ PMID:24163125
Influenza Research Database: an integrated bioinformatics resource for influenza research and surveillance

PubMed Central

Squires, R. Burke; Noronha, Jyothi; Hunt, Victoria; García‐Sastre, Adolfo; Macken, Catherine; Baumgarth, Nicole; Suarez, David; Pickett, Brett E.; Zhang, Yun; Larsen, Christopher N.; Ramsey, Alvin; Zhou, Liwei; Zaremba, Sam; Kumar, Sanjeev; Deitrich, Jon; Klem, Edward; Scheuermann, Richard H.

2012-01-01

Please cite this paper as: Squires et al. (2012) Influenza research database: an integrated bioinformatics resource for influenza research and surveillance. Influenza and Other Respiratory Viruses 6(6), 404–416. Background The recent emergence of the 2009 pandemic influenza A/H1N1 virus has highlighted the value of free and open access to influenza virus genome sequence data integrated with information about other important virus characteristics. Design The Influenza Research Database (IRD, http://www.fludb.org) is a free, open, publicly‐accessible resource funded by the U.S. National Institute of Allergy and Infectious Diseases through the Bioinformatics Resource Centers program. IRD provides a comprehensive, integrated database and analysis resource for influenza sequence, surveillance, and research data, including user‐friendly interfaces for data retrieval, visualization and comparative genomics analysis, together with personal log in‐protected ‘workbench’ spaces for saving data sets and analysis results. IRD integrates genomic, proteomic, immune epitope, and surveillance data from a variety of sources, including public databases, computational algorithms, external research groups, and the scientific literature. Results To demonstrate the utility of the data and analysis tools available in IRD, two scientific use cases are presented. A comparison of hemagglutinin sequence conservation and epitope coverage information revealed highly conserved protein regions that can be recognized by the human adaptive immune system as possible targets for inducing cross‐protective immunity. Phylogenetic and geospatial analysis of sequences from wild bird surveillance samples revealed a possible evolutionary connection between influenza virus from Delaware Bay shorebirds and Alberta ducks. Conclusions The IRD provides a wealth of integrated data and information about influenza virus to support research of the genetic determinants dictating virus pathogenicity, host range restriction and transmission, and to facilitate development of vaccines, diagnostics, and therapeutics. PMID:22260278
The Astrobiology Habitable Environments Database (AHED)

NASA Astrophysics Data System (ADS)

Lafuente, B.; Stone, N.; Downs, R. T.; Blake, D. F.; Bristow, T.; Fonda, M.; Pires, A.

2015-12-01

The Astrobiology Habitable Environments Database (AHED) is a central, high quality, long-term searchable repository for archiving and collaborative sharing of astrobiologically relevant data, including, morphological, textural and contextural images, chemical, biochemical, isotopic, sequencing, and mineralogical information. The aim of AHED is to foster long-term innovative research by supporting integration and analysis of diverse datasets in order to: 1) help understand and interpret planetary geology; 2) identify and characterize habitable environments and pre-biotic/biotic processes; 3) interpret returned data from present and past missions; 4) provide a citable database of NASA-funded published and unpublished data (after an agreed-upon embargo period). AHED uses the online open-source software "The Open Data Repository's Data Publisher" (ODR - http://www.opendatarepository.org) [1], which provides a user-friendly interface that research teams or individual scientists can use to design, populate and manage their own database according to the characteristics of their data and the need to share data with collaborators or the broader scientific community. This platform can be also used as a laboratory notebook. The database will have the capability to import and export in a variety of standard formats. Advanced graphics will be implemented including 3D graphing, multi-axis graphs, error bars, and similar scientific data functions together with advanced online tools for data analysis (e. g. the statistical package, R). A permissions system will be put in place so that as data are being actively collected and interpreted, they will remain proprietary. A citation system will allow research data to be used and appropriately referenced by other researchers after the data are made public. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, Mars Science Laboratory Investigations. [1] Nate et al. (2015) AGU, submitted.
Increasing the value of geospatial informatics with open approaches for Big Data

NASA Astrophysics Data System (ADS)

Percivall, G.; Bermudez, L. E.

2017-12-01

Open approaches to big data provide geoscientists with new capabilities to address problems of unmatched size and complexity. Consensus approaches for Big Geo Data have been addressed in multiple international workshops and testbeds organized by the Open Geospatial Consortium (OGC) in the past year. Participants came from government (NASA, ESA, USGS, NOAA, DOE); research (ORNL, NCSA, IU, JPL, CRIM, RENCI); industry (ESRI, Digital Globe, IBM, rasdaman); standards (JTC 1/NIST); and open source software communities. Results from the workshops and testbeds are documented in Testbed reports and a White Paper published by the OGC. The White Paper identifies the following set of use cases: Collection and Ingest: Remote sensed data processing; Data stream processing Prepare and Structure: SQL and NoSQL databases; Data linking; Feature identification Analytics and Visualization: Spatial-temporal analytics; Machine Learning; Data Exploration Modeling and Prediction: Integrated environmental models; Urban 4D models. Open implementations were developed in the Arctic Spatial Data Pilot using Discrete Global Grid Systems (DGGS) and in Testbeds using WPS and ESGF to publish climate predictions. Further development activities to advance open implementations of Big Geo Data include the following: Open Cloud Computing: Avoid vendor lock-in through API interoperability and Application portability. Open Source Extensions: Implement geospatial data representations in projects from Apache, Location Tech, and OSGeo. Investigate parallelization strategies for N-Dimensional spatial data. Geospatial Data Representations: Schemas to improve processing and analysis using geospatial concepts: Features, Coverages, DGGS. Use geospatial encodings like NetCDF and GeoPackge. Big Linked Geodata: Use linked data methods scaled to big geodata. Analysis Ready Data: Support "Download as last resort" and "Analytics as a service". Promote elements common to "datacubes."

The Ultracool Typing Kit - An Open-Source, Qualitative Spectral Typing GUI for L Dwarfs

NASA Astrophysics Data System (ADS)

Schwab, Ellianna; Cruz, Kelle; Núñez, Alejandro; Burgasser, Adam J.; Rice, Emily; Reid, Neill; Faherty, Jacqueline K.; BDNYC

2018-01-01

The Ultracool Typing Kit (UTK) is an open-source graphical user interface for classifying the NIR spectral types of L dwarfs, including field and low-gravity dwarfs spanning L0-L9. The user is able to input an NIR spectrum and qualitatively compare the input spectrum to a full suite of spectral templates, including low-gravity beta and gamma templates. The user can choose to view the input spectrum as both a band-by-band comparison with the templates and a full bandwidth comparison with NIR spectral standards. Once an optimal qualitative comparison is selected, the user can save their spectral type selection both graphically and to a database. Using UTK to classify 78 previously typed L dwarfs, we show that a band-by-band classification method more accurately agrees with optical spectral typing systems than previous L dwarf NIR classification schemes. UTK is written in python, released on Zenodo with a BSD-3 clause license and publicly available on the BDNYC Github page.
ProDaMa: an open source Python library to generate protein structure datasets.

PubMed

Armano, Giuliano; Manconi, Andrea

2009-10-02

The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Special population planner 4 : an open source release.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kuiper, J.; Metz, W.; Tanzman, E.

2008-01-01

Emergencies like Hurricane Katrina and the recent California wildfires underscore the critical need to meet the complex challenge of planning for individuals with special needs and for institutionalized special populations. People with special needs and special populations often have difficulty responding to emergencies or taking protective actions, and emergency responders may be unaware of their existence and situations during a crisis. Special Population Planner (SPP) is an ArcGIS-based emergency planning system released as an open source product. SPP provides for easy production of maps, reports, and analyses to develop and revise emergency response plans. It includes tools to manage amore » voluntary registry of data for people with special needs, integrated links to plans and documents, tools for response planning and analysis, preformatted reports and maps, and data on locations of special populations, facility and resource characteristics, and contacts. The system can be readily adapted for new settings without programming and is broadly applicable. Full documentation and a demonstration database are included in the release.« less
Automatic detection of adverse events to predict drug label changes using text and data mining techniques.

PubMed

Gurulingappa, Harsha; Toldo, Luca; Rajput, Abdul Mateen; Kors, Jan A; Taweel, Adel; Tayrouz, Yorki

2013-11-01

The aim of this study was to assess the impact of automatically detected adverse event signals from text and open-source data on the prediction of drug label changes. Open-source adverse effect data were collected from FAERS, Yellow Cards and SIDER databases. A shallow linguistic relation extraction system (JSRE) was applied for extraction of adverse effects from MEDLINE case reports. Statistical approach was applied on the extracted datasets for signal detection and subsequent prediction of label changes issued for 29 drugs by the UK Regulatory Authority in 2009. 76% of drug label changes were automatically predicted. Out of these, 6% of drug label changes were detected only by text mining. JSRE enabled precise identification of four adverse drug events from MEDLINE that were undetectable otherwise. Changes in drug labels can be predicted automatically using data and text mining techniques. Text mining technology is mature and well-placed to support the pharmacovigilance tasks. Copyright © 2013 John Wiley & Sons, Ltd.
Low Cost, Scalable Proteomics Data Analysis Using Amazon's Cloud Computing Services and Open Source Search Algorithms

PubMed Central

Halligan, Brian D.; Geiger, Joey F.; Vallejos, Andrew K.; Greene, Andrew S.; Twigger, Simon N.

2009-01-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step by step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center website (http://proteomics.mcw.edu/vipdac). PMID:19358578
Dynamic online surveys and experiments with the free open-source software dynQuest.

PubMed

Rademacher, Jens D M; Lippke, Sonia

2007-08-01

With computers and the World Wide Web widely available, collecting data through Web browsers is an attractive method utilized by the social sciences. In this article, conducting PC- and Web-based trials with the software package dynQuest is described. The software manages dynamic questionnaire-based trials over the Internet or on single computers, possibly as randomized control trials (RCT), if two or more groups are involved. The choice of follow-up questions can depend on previous responses, as needed for matched interventions. Data are collected in a simple text-based database that can be imported easily into other programs for postprocessing and statistical analysis. The software consists of platform-independent scripts written in the programming language PERL that use the common gateway interface between Web browser and server for submission of data through HTML forms. Advantages of dynQuest are parsimony, simplicity in use and installation, transparency, and reliability. The program is available as open-source freeware from the authors.
Low cost, scalable proteomics data analysis using Amazon's cloud computing services and open source search algorithms.

PubMed

Halligan, Brian D; Geiger, Joey F; Vallejos, Andrew K; Greene, Andrew S; Twigger, Simon N

2009-06-01

One of the major difficulties for many laboratories setting up proteomics programs has been obtaining and maintaining the computational infrastructure required for the analysis of the large flow of proteomics data. We describe a system that combines distributed cloud computing and open source software to allow laboratories to set up scalable virtual proteomics analysis clusters without the investment in computational hardware or software licensing fees. Additionally, the pricing structure of distributed computing providers, such as Amazon Web Services, allows laboratories or even individuals to have large-scale computational resources at their disposal at a very low cost per run. We provide detailed step-by-step instructions on how to implement the virtual proteomics analysis clusters as well as a list of current available preconfigured Amazon machine images containing the OMSSA and X!Tandem search algorithms and sequence databases on the Medical College of Wisconsin Proteomics Center Web site ( http://proteomics.mcw.edu/vipdac ).
Neuroinformatics Database (NiDB) – A Modular, Portable Database for the Storage, Analysis, and Sharing of Neuroimaging Data

PubMed Central

Anderson, Beth M.; Stevens, Michael C.; Glahn, David C.; Assaf, Michal; Pearlson, Godfrey D.

2013-01-01

We present a modular, high performance, open-source database system that incorporates popular neuroimaging database features with novel peer-to-peer sharing, and a simple installation. An increasing number of imaging centers have created a massive amount of neuroimaging data since fMRI became popular more than 20 years ago, with much of that data unshared. The Neuroinformatics Database (NiDB) provides a stable platform to store and manipulate neuroimaging data and addresses several of the impediments to data sharing presented by the INCF Task Force on Neuroimaging Datasharing, including 1) motivation to share data, 2) technical issues, and 3) standards development. NiDB solves these problems by 1) minimizing PHI use, providing a cost effective simple locally stored platform, 2) storing and associating all data (including genome) with a subject and creating a peer-to-peer sharing model, and 3) defining a sample, normalized definition of a data storage structure that is used in NiDB. NiDB not only simplifies the local storage and analysis of neuroimaging data, but also enables simple sharing of raw data and analysis methods, which may encourage further sharing. PMID:23912507
Plant Reactome: a resource for plant pathways and comparative analysis

PubMed Central

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D.; Wu, Guanming; Fabregat, Antonio; Elser, Justin L.; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D.; Ware, Doreen; Jaiswal, Pankaj

2017-01-01

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. PMID:27799469
iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D

PubMed Central

Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron

2017-01-01

Abstract Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. PMID:28814063
Free and open-source software application for the evaluation of coronary computed tomography angiography images.

PubMed

Hadlich, Marcelo Souza; Oliveira, Gláucia Maria Moraes; Feijóo, Raúl A; Azevedo, Clerio F; Tura, Bernardo Rangel; Ziemer, Paulo Gustavo Portela; Blanco, Pablo Javier; Pina, Gustavo; Meira, Márcio; Souza e Silva, Nelson Albuquerque de

2012-10-01

The standardization of images used in Medicine in 1993 was performed using the DICOM (Digital Imaging and Communications in Medicine) standard. Several tests use this standard and it is increasingly necessary to design software applications capable of handling this type of image; however, these software applications are not usually free and open-source, and this fact hinders their adjustment to most diverse interests. To develop and validate a free and open-source software application capable of handling DICOM coronary computed tomography angiography images. We developed and tested the ImageLab software in the evaluation of 100 tests randomly selected from a database. We carried out 600 tests divided between two observers using ImageLab and another software sold with Philips Brilliance computed tomography appliances in the evaluation of coronary lesions and plaques around the left main coronary artery (LMCA) and the anterior descending artery (ADA). To evaluate intraobserver, interobserver and intersoftware agreements, we used simple and kappa statistics agreements. The agreements observed between software applications were generally classified as substantial or almost perfect in most comparisons. The ImageLab software agreed with the Philips software in the evaluation of coronary computed tomography angiography tests, especially in patients without lesions, with lesions < 50% in the LMCA and < 70% in the ADA. The agreement for lesions > 70% in the ADA was lower, but this is also observed when the anatomical reference standard is used.
iCAVE: an open source tool for visualizing biomolecular networks in 3D, stereoscopic 3D and immersive 3D.

PubMed

Liluashvili, Vaja; Kalayci, Selim; Fluder, Eugene; Wilson, Manda; Gabow, Aaron; Gümüs, Zeynep H

2017-08-01

Visualizations of biomolecular networks assist in systems-level data exploration in many cellular processes. Data generated from high-throughput experiments increasingly inform these networks, yet current tools do not adequately scale with concomitant increase in their size and complexity. We present an open source software platform, interactome-CAVE (iCAVE), for visualizing large and complex biomolecular interaction networks in 3D. Users can explore networks (i) in 3D using a desktop, (ii) in stereoscopic 3D using 3D-vision glasses and a desktop, or (iii) in immersive 3D within a CAVE environment. iCAVE introduces 3D extensions of known 2D network layout, clustering, and edge-bundling algorithms, as well as new 3D network layout algorithms. Furthermore, users can simultaneously query several built-in databases within iCAVE for network generation or visualize their own networks (e.g., disease, drug, protein, metabolite). iCAVE has modular structure that allows rapid development by addition of algorithms, datasets, or features without affecting other parts of the code. Overall, iCAVE is the first freely available open source tool that enables 3D (optionally stereoscopic or immersive) visualizations of complex, dense, or multi-layered biomolecular networks. While primarily designed for researchers utilizing biomolecular networks, iCAVE can assist researchers in any field. © The Authors 2017. Published by Oxford University Press.
ClearedLeavesDB: an online database of cleared plant leaf images

PubMed Central

2014-01-01

Background Leaf vein networks are critical to both the structure and function of leaves. A growing body of recent work has linked leaf vein network structure to the physiology, ecology and evolution of land plants. In the process, multiple institutions and individual researchers have assembled collections of cleared leaf specimens in which vascular bundles (veins) are rendered visible. In an effort to facilitate analysis and digitally preserve these specimens, high-resolution images are usually created, either of entire leaves or of magnified leaf subsections. In a few cases, collections of digital images of cleared leaves are available for use online. However, these collections do not share a common platform nor is there a means to digitally archive cleared leaf images held by individual researchers (in addition to those held by institutions). Hence, there is a growing need for a digital archive that enables online viewing, sharing and disseminating of cleared leaf image collections held by both institutions and individual researchers. Description The Cleared Leaf Image Database (ClearedLeavesDB), is an online web-based resource for a community of researchers to contribute, access and share cleared leaf images. ClearedLeavesDB leverages resources of large-scale, curated collections while enabling the aggregation of small-scale collections within the same online platform. ClearedLeavesDB is built on Drupal, an open source content management platform. It allows plant biologists to store leaf images online with corresponding meta-data, share image collections with a user community and discuss images and collections via a common forum. We provide tools to upload processed images and results to the database via a web services client application that can be downloaded from the database. Conclusions We developed ClearedLeavesDB, a database focusing on cleared leaf images that combines interactions between users and data via an intuitive web interface. The web interface allows storage of large collections and integrates with leaf image analysis applications via an open application programming interface (API). The open API allows uploading of processed images and other trait data to the database, further enabling distribution and documentation of analyzed data within the community. The initial database is seeded with nearly 19,000 cleared leaf images representing over 40 GB of image data. Extensible storage and growth of the database is ensured by using the data storage resources of the iPlant Discovery Environment. ClearedLeavesDB can be accessed at http://clearedleavesdb.org. PMID:24678985
ClearedLeavesDB: an online database of cleared plant leaf images.

PubMed

Das, Abhiram; Bucksch, Alexander; Price, Charles A; Weitz, Joshua S

2014-03-28

Leaf vein networks are critical to both the structure and function of leaves. A growing body of recent work has linked leaf vein network structure to the physiology, ecology and evolution of land plants. In the process, multiple institutions and individual researchers have assembled collections of cleared leaf specimens in which vascular bundles (veins) are rendered visible. In an effort to facilitate analysis and digitally preserve these specimens, high-resolution images are usually created, either of entire leaves or of magnified leaf subsections. In a few cases, collections of digital images of cleared leaves are available for use online. However, these collections do not share a common platform nor is there a means to digitally archive cleared leaf images held by individual researchers (in addition to those held by institutions). Hence, there is a growing need for a digital archive that enables online viewing, sharing and disseminating of cleared leaf image collections held by both institutions and individual researchers. The Cleared Leaf Image Database (ClearedLeavesDB), is an online web-based resource for a community of researchers to contribute, access and share cleared leaf images. ClearedLeavesDB leverages resources of large-scale, curated collections while enabling the aggregation of small-scale collections within the same online platform. ClearedLeavesDB is built on Drupal, an open source content management platform. It allows plant biologists to store leaf images online with corresponding meta-data, share image collections with a user community and discuss images and collections via a common forum. We provide tools to upload processed images and results to the database via a web services client application that can be downloaded from the database. We developed ClearedLeavesDB, a database focusing on cleared leaf images that combines interactions between users and data via an intuitive web interface. The web interface allows storage of large collections and integrates with leaf image analysis applications via an open application programming interface (API). The open API allows uploading of processed images and other trait data to the database, further enabling distribution and documentation of analyzed data within the community. The initial database is seeded with nearly 19,000 cleared leaf images representing over 40 GB of image data. Extensible storage and growth of the database is ensured by using the data storage resources of the iPlant Discovery Environment. ClearedLeavesDB can be accessed at http://clearedleavesdb.org.
ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data

PubMed Central

2016-01-01

ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app. PMID:27302480
ProXL (Protein Cross-Linking Database): A Platform for Analysis, Visualization, and Sharing of Protein Cross-Linking Mass Spectrometry Data.

PubMed

Riffle, Michael; Jaschob, Daniel; Zelter, Alex; Davis, Trisha N

2016-08-05

ProXL is a Web application and accompanying database designed for sharing, visualizing, and analyzing bottom-up protein cross-linking mass spectrometry data with an emphasis on structural analysis and quality control. ProXL is designed to be independent of any particular software pipeline. The import process is simplified by the use of the ProXL XML data format, which shields developers of data importers from the relative complexity of the relational database schema. The database and Web interfaces function equally well for any software pipeline and allow data from disparate pipelines to be merged and contrasted. ProXL includes robust public and private data sharing capabilities, including a project-based interface designed to ensure security and facilitate collaboration among multiple researchers. ProXL provides multiple interactive and highly dynamic data visualizations that facilitate structural-based analysis of the observed cross-links as well as quality control. ProXL is open-source, well-documented, and freely available at https://github.com/yeastrc/proxl-web-app .
PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

PubMed

Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

2016-01-01

Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.
Text mining for metabolic pathways, signaling cascades, and protein networks.

PubMed

Hoffmann, Robert; Krallinger, Martin; Andres, Eduardo; Tamames, Javier; Blaschke, Christian; Valencia, Alfonso

2005-05-10

The complexity of the information stored in databases and publications on metabolic and signaling pathways, the high throughput of experimental data, and the growing number of publications make it imperative to provide systems to help the researcher navigate through these interrelated information resources. Text-mining methods have started to play a key role in the creation and maintenance of links between the information stored in biological databases and its original sources in the literature. These links will be extremely useful for database updating and curation, especially if a number of technical problems can be solved satisfactorily, including the identification of protein and gene names (entities in general) and the characterization of their types of interactions. The first generation of openly accessible text-mining systems, such as iHOP (Information Hyperlinked over Proteins), provides additional functions to facilitate the reconstruction of protein interaction networks, combine database and text information, and support the scientist in the formulation of novel hypotheses. The next challenge is the generation of comprehensive information regarding the general function of signaling pathways and protein interaction networks.
Biopython: freely available Python tools for computational molecular biology and bioinformatics

PubMed Central

Cock, Peter J. A.; Antao, Tiago; Chang, Jeffrey T.; Chapman, Brad A.; Cox, Cymon J.; Dalke, Andrew; Friedberg, Iddo; Hamelryck, Thomas; Kauff, Frank; Wilczynski, Bartek; de Hoon, Michiel J. L.

2009-01-01

Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license. Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk. PMID:19304878
Condensing Massive Satellite Datasets For Rapid Interactive Analysis

NASA Astrophysics Data System (ADS)

Grant, G.; Gallaher, D. W.; Lv, Q.; Campbell, G. G.; Fowler, C.; LIU, Q.; Chen, C.; Klucik, R.; McAllister, R. A.

2015-12-01

Our goal is to enable users to interactively analyze massive satellite datasets, identifying anomalous data or values that fall outside of thresholds. To achieve this, the project seeks to create a derived database containing only the most relevant information, accelerating the analysis process. The database is designed to be an ancillary tool for the researcher, not an archival database to replace the original data. This approach is aimed at improving performance by reducing the overall size by way of condensing the data. The primary challenges of the project include: - The nature of the research question(s) may not be known ahead of time. - The thresholds for determining anomalies may be uncertain. - Problems associated with processing cloudy, missing, or noisy satellite imagery. - The contents and method of creation of the condensed dataset must be easily explainable to users. The architecture of the database will reorganize spatially-oriented satellite imagery into temporally-oriented columns of data (a.k.a., "data rods") to facilitate time-series analysis. The database itself is an open-source parallel database, designed to make full use of clustered server technologies. A demonstration of the system capabilities will be shown. Applications for this technology include quick-look views of the data, as well as the potential for on-board satellite processing of essential information, with the goal of reducing data latency.

SNPversity: a web-based tool for visualizing diversity

PubMed Central

Schott, David A; Vinnakota, Abhinav G; Portwood, John L; Andorf, Carson M

2018-01-01

Abstract Many stand-alone desktop software suites exist to visualize single nucleotide polymorphism (SNP) diversity, but web-based software that can be easily implemented and used for biological databases is absent. SNPversity was created to answer this need by building an open-source visualization tool that can be implemented on a Unix-like machine and served through a web browser that can be accessible worldwide. SNPversity consists of a HDF5 database back-end for SNPs, a data exchange layer powered by TASSEL libraries that represent data in JSON format, and an interface layer using PHP to visualize SNP information. SNPversity displays data in real-time through a web browser in grids that are color-coded according to a given SNP’s allelic status and mutational state. SNPversity is currently available at MaizeGDB, the maize community’s database, and will be soon available at GrainGenes, the clade-oriented database for Triticeae and Avena species, including wheat, barley, rye, and oat. The code and documentation are uploaded onto github, and they are freely available to the public. We expect that the tool will be highly useful for other biological databases with a similar need to display SNP diversity through their web interfaces. Database URL: https://www.maizegdb.org/snpversity PMID:29688387
An open source web interface for linking models to infrastructure system databases

NASA Astrophysics Data System (ADS)

Knox, S.; Mohamed, K.; Harou, J. J.; Rheinheimer, D. E.; Medellin-Azuara, J.; Meier, P.; Tilmant, A.; Rosenberg, D. E.

2016-12-01

Models of networked engineered resource systems such as water or energy systems are often built collaboratively with developers from different domains working at different locations. These models can be linked to large scale real world databases, and they are constantly being improved and extended. As the development and application of these models becomes more sophisticated, and the computing power required for simulations and/or optimisations increases, so has the need for online services and tools which enable the efficient development and deployment of these models. Hydra Platform is an open source, web-based data management system, which allows modellers of network-based models to remotely store network topology and associated data in a generalised manner, allowing it to serve multiple disciplines. Hydra Platform uses a web API using JSON to allow external programs (referred to as `Apps') to interact with its stored networks and perform actions such as importing data, running models, or exporting the networks to different formats. Hydra Platform supports multiple users accessing the same network and has a suite of functions for managing users and data. We present ongoing development in Hydra Platform, the Hydra Web User Interface, through which users can collaboratively manage network data and models in a web browser. The web interface allows multiple users to graphically access, edit and share their networks, run apps and view results. Through apps, which are located on the server, the web interface can give users access to external data sources and models without the need to install or configure any software. This also ensures model results can be reproduced by removing platform or version dependence. Managing data and deploying models via the web interface provides a way for multiple modellers to collaboratively manage data, deploy and monitor model runs and analyse results.
Environmental asbestos exposure sources in Korea

PubMed Central

2016-01-01

Background Because of the long asbestos-related disease latencies (10–50 years), detection, diagnosis, and epidemiologic studies require asbestos exposure history. However, environmental asbestos exposure source (EAES) data are lacking. Objectives To survey the available data for past EAES and supplement these data with interviews. Methods We constructed an EAES database using a literature review and interviews of experts, former traders, and workers. Exposure sources by time period and type were visualized using a geographic information system (ArcGIS), web-based mapping (Google Maps), and OpenWeatherMap. The data were mounted in the GIS to show the exposure source location and trend. Results The majority of asbestos mines, factories, and consumption was located in Chungnam; Gyeonggi, Busan, and Gyeongnam; and Gyeonggi, Daejeon, and Busan, respectively. Shipbuilding and repair companies were mostly located in Busan and Gyeongnam. Conclusions These tools might help evaluate past exposure from EAES and estimate the future asbestos burden in Korea. PMID:27726756
Environmental asbestos exposure sources in Korea.

PubMed

Kang, Dong-Mug; Kim, Jong-Eun; Kim, Ju-Young; Lee, Hyun-Hee; Hwang, Young-Sik; Kim, Young-Ki; Lee, Yong-Jin

2016-10-01

Because of the long asbestos-related disease latencies (10-50 years), detection, diagnosis, and epidemiologic studies require asbestos exposure history. However, environmental asbestos exposure source (EAES) data are lacking. To survey the available data for past EAES and supplement these data with interviews. We constructed an EAES database using a literature review and interviews of experts, former traders, and workers. Exposure sources by time period and type were visualized using a geographic information system (ArcGIS), web-based mapping (Google Maps), and OpenWeatherMap. The data were mounted in the GIS to show the exposure source location and trend. The majority of asbestos mines, factories, and consumption was located in Chungnam; Gyeonggi, Busan, and Gyeongnam; and Gyeonggi, Daejeon, and Busan, respectively. Shipbuilding and repair companies were mostly located in Busan and Gyeongnam. These tools might help evaluate past exposure from EAES and estimate the future asbestos burden in Korea.
Ensembl 2002: accommodating comparative genomics.

PubMed

Clamp, M; Andrews, D; Barker, D; Bevan, P; Cameron, G; Chen, Y; Clark, L; Cox, T; Cuff, J; Curwen, V; Down, T; Durbin, R; Eyras, E; Gilbert, J; Hammond, M; Hubbard, T; Kasprzyk, A; Keefe, D; Lehvaslaiho, H; Iyer, V; Melsopp, C; Mongin, E; Pettett, R; Potter, S; Rust, A; Schmidt, E; Searle, S; Slater, G; Smith, J; Spooner, W; Stabenau, A; Stalker, J; Stupka, E; Ureta-Vidal, A; Vastrik, I; Birney, E

2003-01-01

The Ensembl (http://www.ensembl.org/) database project provides a bioinformatics framework to organise biology around the sequences of large genomes. It is a comprehensive source of stable automatic annotation of human, mouse and other genome sequences, available as either an interactive web site or as flat files. Ensembl also integrates manually annotated gene structures from external sources where available. As well as being one of the leading sources of genome annotation, Ensembl is an open source software engineering project to develop a portable system able to handle very large genomes and associated requirements. These range from sequence analysis to data storage and visualisation and installations exist around the world in both companies and at academic sites. With both human and mouse genome sequences available and more vertebrate sequences to follow, many of the recent developments in Ensembl have focusing on developing automatic comparative genome analysis and visualisation.
Vanderbilt University Institute of Imaging Science Center for Computational Imaging XNAT: A multimodal data archive and processing environment.

PubMed

Harrigan, Robert L; Yvernault, Benjamin C; Boyd, Brian D; Damon, Stephen M; Gibney, Kyla David; Conrad, Benjamin N; Phillips, Nicholas S; Rogers, Baxter P; Gao, Yurui; Landman, Bennett A

2016-01-01

The Vanderbilt University Institute for Imaging Science (VUIIS) Center for Computational Imaging (CCI) has developed a database built on XNAT housing over a quarter of a million scans. The database provides framework for (1) rapid prototyping, (2) large scale batch processing of images and (3) scalable project management. The system uses the web-based interfaces of XNAT and REDCap to allow for graphical interaction. A python middleware layer, the Distributed Automation for XNAT (DAX) package, distributes computation across the Vanderbilt Advanced Computing Center for Research and Education high performance computing center. All software are made available in open source for use in combining portable batch scripting (PBS) grids and XNAT servers. Copyright © 2015 Elsevier Inc. All rights reserved.
The Saccharomyces Genome Database Variant Viewer

PubMed Central

Sheppard, Travis K.; Hitz, Benjamin C.; Engel, Stacia R.; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C.; Dalusag, Kyla S.; Demeter, Janos; Hellerstedt, Sage T.; Karra, Kalpana; Nash, Robert S.; Paskov, Kelley M.; Skrzypek, Marek S.; Weng, Shuai; Wong, Edith D.; Cherry, J. Michael

2016-01-01

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. PMID:26578556
Social media based NPL system to find and retrieve ARM data: Concept paper

DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Devarakonda, Ranjeet; Giansiracusa, Michael T.; Kumar, Jitendra

Information connectivity and retrieval has a role in our daily lives. The most pervasive source of online information is databases. The amount of data is growing at rapid rate and database technology is improving and having a profound effect. Almost all online applications are storing and retrieving information from databases. One challenge in supplying the public with wider access to informational databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it may notmore » be practical to make the public aware of the structure of the database. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide more intuitive method for generating database queries and delivering responses. Social media makes it possible to interact with a wide section of the population. Through this medium, and with the help of Natural Language Processing (NLP) we can make the data of the Atmospheric Radiation Measurement Data Center (ADC) more accessible to the public. We propose an architecture for using Apache Lucene/Solr [1], OpenML [2,3], and Kafka [4] to generate an automated query/response system with inputs from Twitter5, our Cassandra DB, and our log database. Using the Twitter API and NLP we can give the public the ability to ask questions of our database and get automated responses.« less
Open-access MIMIC-II database for intensive care research.

PubMed

Lee, Joon; Scott, Daniel J; Villarroel, Mauricio; Clifford, Gari D; Saeed, Mohammed; Mark, Roger G

2011-01-01

The critical state of intensive care unit (ICU) patients demands close monitoring, and as a result a large volume of multi-parameter data is collected continuously. This represents a unique opportunity for researchers interested in clinical data mining. We sought to foster a more transparent and efficient intensive care research community by building a publicly available ICU database, namely Multiparameter Intelligent Monitoring in Intensive Care II (MIMIC-II). The data harnessed in MIMIC-II were collected from the ICUs of Beth Israel Deaconess Medical Center from 2001 to 2008 and represent 26,870 adult hospital admissions (version 2.6). MIMIC-II consists of two major components: clinical data and physiological waveforms. The clinical data, which include patient demographics, intravenous medication drip rates, and laboratory test results, were organized into a relational database. The physiological waveforms, including 125 Hz signals recorded at bedside and corresponding vital signs, were stored in an open-source format. MIMIC-II data were also deidentified in order to remove protected health information. Any interested researcher can gain access to MIMIC-II free of charge after signing a data use agreement and completing human subjects training. MIMIC-II can support a wide variety of research studies, ranging from the development of clinical decision support algorithms to retrospective clinical studies. We anticipate that MIMIC-II will be an invaluable resource for intensive care research by stimulating fair comparisons among different studies.
Sensing Slow Mobility and Interesting Locations for Lombardy Region (italy): a Case Study Using Pointwise Geolocated Open Data

NASA Astrophysics Data System (ADS)

Brovelli, M. A.; Oxoli, D.; Zurbarán, M. A.

2016-06-01

During the past years Web 2.0 technologies have caused the emergence of platforms where users can share data related to their activities which in some cases are then publicly released with open licenses. Popular categories for this include community platforms where users can upload GPS tracks collected during slow travel activities (e.g. hiking, biking and horse riding) and platforms where users share their geolocated photos. However, due to the high heterogeneity of the information available on the Web, the sole use of these user-generated contents makes it an ambitious challenge to understand slow mobility flows as well as to detect the most visited locations in a region. Exploiting the available data on community sharing websites allows to collect near real-time open data streams and enables rigorous spatial-temporal analysis. This work presents an approach for collecting, unifying and analysing pointwise geolocated open data available from different sources with the aim of identifying the main locations and destinations of slow mobility activities. For this purpose, we collected pointwise open data from the Wikiloc platform, Twitter, Flickr and Foursquare. The analysis was confined to the data uploaded in Lombardy Region (Northern Italy) - corresponding to millions of pointwise data. Collected data was processed through the use of Free and Open Source Software (FOSS) in order to organize them into a suitable database. This allowed to run statistical analyses on data distribution in both time and space by enabling the detection of users' slow mobility preferences as well as places of interest at a regional scale.
Comparison of Conflicts of Interest among Published Hernia Researchers Self-Reported with the Centers for Medicare and Medicaid Services Open Payments Database.

PubMed

Olavarria, Oscar A; Holihan, Julie L; Cherla, Deepa; Perez, Cristina A; Kao, Lillian S; Ko, Tien C; Liang, Mike K

2017-05-01

Many healthcare providers have financial interests and relationships with healthcare companies. To maintain transparency, investigators are expected to disclose their conflicts of interest (COIs). Recently, the Centers for Medicare and Medicaid Services developed the Open Payment database of COIs reported by industry. We hypothesize that there is discordance between industry-reported and physician self-reported COIs in ventral hernia publications. PubMed was searched for ventral hernia studies accepted for publication between June 2013 and October 2015 and published by authors from the US. Conflicts of interest were defined as payments received as honoraria, consulting fees, compensation for serving as faculty or as a speaker at a venue, research funding payments, or having ownerships/partnerships in companies. Conflicts of interest disclosed on the published articles were compared with the financial relationships in the Open Payments database. A total of 100 studies were selected with 497 participating authors. Information was available from the Open Payments database for 245 (49.2%) authors, of which 134 (26.9%) met the definition for COI. When comparing COIs disclosed by authors and data in the Open Payments database, 81 (16.3%) authors had at least 1 COI but did not declare any, 35 (7.0%) authors had COIs other than what they declared, and 20 (4.0%) declared a COI not listed in the Open Payments database, for a combined discordance rate of 27.3%. There is substantial discordance between self-reported COI in published articles compared with those in the Centers for Medicare and Medicaid Services Open Payments database. Additional studies are needed to determine the reasons for these differences, as COI can influence the validity of the design, conduct, and results of a study. Copyright © 2017 American College of Surgeons. Published by Elsevier Inc. All rights reserved.
Effectiveness of open versus endovascular abdominal aortic aneurysm repair in population settings: A systematic review of statewide databases.

PubMed

Williams, Christopher R; Brooke, Benjamin S

2017-10-01

Patient outcomes after open abdominal aortic aneurysm and endovascular aortic aneurysm repair have been widely reported from several large, randomized, controlled trials. It is not clear whether these trial outcomes are representative of abdominal aortic aneurysm repair procedures performed in real-world hospital settings across the United States. This study was designed to evaluate population-based outcomes after endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair using statewide inpatient databases and examine how they have helped improve our understanding of abdominal aortic aneurysm repair. A systematic search of MEDLINE, EMBASE, and CINAHL databases was performed to identify articles comparing endovascular aortic aneurysm repair and open abdominal aortic aneurysm repair using data from statewide inpatient databases. This search was limited to studies published in the English language after 1990, and abstracts were screened and abstracted by 2 authors. Our search yielded 17 studies published between 2004 and 2016 that used data from 29 different statewide inpatient databases to compare endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair. These studies support the randomized, controlled trial results, including a lower mortality associated with endovascular aortic aneurysm repair extended from the perioperative period up to 3 years after operation, as well as a higher complication rate after endovascular aortic aneurysm repair. The evidence from statewide inpatient database analyses has also elucidated trends in procedure volume, patient case mix, volume-outcome relationships, and health care disparities associated with endovascular aortic aneurysm repair versus open abdominal aortic aneurysm repair. Population analyses of endovascular aortic aneurysm repair and open abdominal aortic aneurysm repair using statewide inpatient databases have confirmed short- and long-term mortality outcomes obtained from large, randomized, controlled trials. Moreover, these analyses have allowed us to assess the effect of endovascular aortic aneurysm repair adoption on population outcomes and patient case mix over time. Published by Elsevier Inc.
Database citation in supplementary data linked to Europe PubMed Central full text biomedical articles.

PubMed

Kafkas, Şenay; Kim, Jee-Hyub; Pi, Xingjun; McEntyre, Johanna R

2015-01-01

In this study, we present an analysis of data citation practices in full text research articles and their corresponding supplementary data files, made available in the Open Access set of articles from Europe PubMed Central. Our aim is to investigate whether supplementary data files should be considered as a source of information for integrating the literature with biomolecular databases. Using text-mining methods to identify and extract a variety of core biological database accession numbers, we found that the supplemental data files contain many more database citations than the body of the article, and that those citations often take the form of a relatively small number of articles citing large collections of accession numbers in text-based files. Moreover, citation of value-added databases derived from submission databases (such as Pfam, UniProt or Ensembl) is common, demonstrating the reuse of these resources as datasets in themselves. All the database accession numbers extracted from the supplementary data are publicly accessible from http://dx.doi.org/10.5281/zenodo.11771. Our study suggests that supplementary data should be considered when linking articles with data, in curation pipelines, and in information retrieval tasks in order to make full use of the entire research article. These observations highlight the need to improve the management of supplemental data in general, in order to make this information more discoverable and useful.
The Finnish disease heritage database (FinDis) update-a database for the genes mutated in the Finnish disease heritage brought to the next-generation sequencing era.

PubMed

Polvi, Anne; Linturi, Henna; Varilo, Teppo; Anttonen, Anna-Kaisa; Byrne, Myles; Fokkema, Ivo F A C; Almusa, Henrikki; Metzidis, Anthony; Avela, Kristiina; Aula, Pertti; Kestilä, Marjo; Muilu, Juha

2013-11-01

The Finnish Disease Heritage Database (FinDis) (http://findis.org) was originally published in 2004 as a centralized information resource for rare monogenic diseases enriched in the Finnish population. The FinDis database originally contained 405 causative variants for 30 diseases. At the time, the FinDis database was a comprehensive collection of data, but since 1994, a large amount of new information has emerged, making the necessity to update the database evident. We collected information and updated the database to contain genes and causative variants for 35 diseases, including six more genes and more than 1,400 additional disease-causing variants. Information for causative variants for each gene is collected under the LOVD 3.0 platform, enabling easy updating. The FinDis portal provides a centralized resource and user interface to link information on each disease and gene with variant data in the LOVD 3.0 platform. The software written to achieve this has been open-sourced and made available on GitHub (http://github.com/findis-db), allowing biomedical institutions in other countries to present their national data in a similar way, and to both contribute to, and benefit from, standardized variation data. The updated FinDis portal provides a unique resource to assist patient diagnosis, research, and the development of new cures. © 2013 WILEY PERIODICALS, INC.
Certifiable database generation for SVS

NASA Astrophysics Data System (ADS)

Schiefele, Jens; Damjanovic, Dejan; Kubbat, Wolfgang

2000-06-01

In future aircraft cockpits SVS will be used to display 3D physical and virtual information to pilots. A review of prototype and production Synthetic Vision Displays (SVD) from Euro Telematic, UPS Advanced Technologies, Universal Avionics, VDO-Luftfahrtgeratewerk, and NASA, are discussed. As data sources terrain, obstacle, navigation, and airport data is needed, Jeppesen-Sanderson, Inc. and Darmstadt Univ. of Technology currently develop certifiable methods for acquisition, validation, and processing methods for terrain, obstacle, and airport databases. The acquired data will be integrated into a High-Quality Database (HQ-DB). This database is the master repository. It contains all information relevant for all types of aviation applications. From the HQ-DB SVS relevant data is retried, converted, decimated, and adapted into a SVS Real-Time Onboard Database (RTO-DB). The process of data acquisition, verification, and data processing will be defined in a way that allows certication within DO-200a and new RTCA/EUROCAE standards for airport and terrain data. The open formats proposed will be established and evaluated for industrial usability. Finally, a NASA-industry cooperation to develop industrial SVS products under the umbrella of the NASA Aviation Safety Program (ASP) is introduced. A key element of the SVS NASA-ASP is the Jeppesen lead task to develop methods for world-wide database generation and certification. Jeppesen will build three airport databases that will be used in flight trials with NASA aircraft.
Assessment of landslide distribution map reliability in Niigata prefecture - Japan using frequency ratio approach

NASA Astrophysics Data System (ADS)

Rahardianto, Trias; Saputra, Aditya; Gomez, Christopher

2017-07-01

Research on landslide susceptibility has evolved rapidly over the few last decades thanks to the availability of large databases. Landslide research used to be focused on discreet events but the usage of large inventory dataset has become a central pillar of landslide susceptibility, hazard, and risk assessment. Indeed, extracting meaningful information from the large database is now at the forth of geoscientific research, following the big-data research trend. Indeed, the more comprehensive information of the past landslide available in a particular area is, the better the produced map will be, in order to support the effective decision making, planning, and engineering practice. The landslide inventory data which is freely accessible online gives an opportunity for many researchers and decision makers to prevent casualties and economic loss caused by future landslides. This data is advantageous especially for areas with poor landslide historical data. Since the construction criteria of landslide inventory map and its quality evaluation remain poorly defined, the assessment of open source landslide inventory map reliability is required. The present contribution aims to assess the reliability of open-source landslide inventory data based on the particular topographical setting of the observed area in Niigata prefecture, Japan. Geographic Information System (GIS) platform and statistical approach are applied to analyze the data. Frequency ratio method is utilized to model and assess the landslide map. The outcomes of the generated model showed unsatisfactory results with AUC value of 0.603 indicate the low prediction accuracy and unreliability of the model.
DOE Office of Scientific and Technical Information (OSTI.GOV)

None

Assessing the impact of energy efficiency technologies at a district or city scale is of great interest to local governments, real estate developers, utility companies, and policymakers. This paper describes a flexible framework that can be used to create and run district and city scale building energy simulations. The framework is built around the new OpenStudio City Database (CityDB). Building footprints, building height, building type, and other data can be imported from public records or other sources. Missing data can be inferred or assigned from a statistical sampling of other datasets. Once all required data is available, OpenStudio Measures aremore » used to create starting point energy models and to model energy efficiency measures for each building. Together this framework allows a user to pose several scenarios such as 'what if 30% of the commercial retail buildings added rooftop solar' or 'what if all elementary schools converted to ground source heat pumps' and then visualize the impacts at a district or city scale. This paper focuses on modeling existing building stock using public records. However, the framework is capable of supporting the evaluation of new construction, district systems, and the use of proprietary data sources.« less
The state and profile of open source software projects in health and medical informatics.

PubMed

Janamanchi, Balaji; Katsamakas, Evangelos; Raghupathi, Wullianallur; Gao, Wei

2009-07-01

Little has been published about the application profiles and development patterns of open source software (OSS) in health and medical informatics. This study explores these issues with an analysis of health and medical informatics related OSS projects on SourceForge, a large repository of open source projects. A search was conducted on the SourceForge website during the period from May 1 to 15, 2007, to identify health and medical informatics OSS projects. This search resulted in a sample of 174 projects. A Java-based parser was written to extract data for several of the key variables of each project. Several visually descriptive statistics were generated to analyze the profiles of the OSS projects. Many of the projects have sponsors, implying a growing interest in OSS among organizations. Sponsorship, we discovered, has a significant impact on project success metrics. Nearly two-thirds of the projects have a restrictive license type. Restrictive licensing may indicate tighter control over the development process. Our sample includes a wide range of projects that are at various stages of development (status). Projects targeted towards the advanced end user are primarily focused on bio-informatics, data formats, database and medical science applications. We conclude that there exists an active and thriving OSS development community that is focusing on health and medical informatics. A wide range of OSS applications are in development, from bio-informatics to hospital information systems. A profile of OSS in health and medical informatics emerges that is distinct and unique to the health care field. Future research can focus on OSS acceptance and diffusion and impact on cost, efficiency and quality of health care.
The Database Query Support Processor (QSP)

NASA Technical Reports Server (NTRS)

1993-01-01

The number and diversity of databases available to users continues to increase dramatically. Currently, the trend is towards decentralized, client server architectures that (on the surface) are less expensive to acquire, operate, and maintain than information architectures based on centralized, monolithic mainframes. The database query support processor (QSP) effort evaluates the performance of a network level, heterogeneous database access capability. Air Force Material Command's Rome Laboratory has developed an approach, based on ANSI standard X3.138 - 1988, 'The Information Resource Dictionary System (IRDS)' to seamless access to heterogeneous databases based on extensions to data dictionary technology. To successfully query a decentralized information system, users must know what data are available from which source, or have the knowledge and system privileges necessary to find out this information. Privacy and security considerations prohibit free and open access to every information system in every network. Even in completely open systems, time required to locate relevant data (in systems of any appreciable size) would be better spent analyzing the data, assuming the original question was not forgotten. Extensions to data dictionary technology have the potential to more fully automate the search and retrieval for relevant data in a decentralized environment. Substantial amounts of time and money could be saved by not having to teach users what data resides in which systems and how to access each of those systems. Information describing data and how to get it could be removed from the application and placed in a dedicated repository where it belongs. The result simplified applications that are less brittle and less expensive to build and maintain. Software technology providing the required functionality is off the shelf. The key difficulty is in defining the metadata required to support the process. The database query support processor effort will provide quantitative data on the amount of effort required to implement an extended data dictionary at the network level, add new systems, adapt to changing user needs, and provide sound estimates on operations and maintenance costs and savings.

DBATE: database of alternative transcripts expression.

PubMed

Bianchi, Valerio; Colantoni, Alessio; Calderone, Alberto; Ausiello, Gabriele; Ferrè, Fabrizio; Helmer-Citterich, Manuela

2013-01-01

The use of high-throughput RNA sequencing technology (RNA-seq) allows whole transcriptome analysis, providing an unbiased and unabridged view of alternative transcript expression. Coupling splicing variant-specific expression with its functional inference is still an open and difficult issue for which we created the DataBase of Alternative Transcripts Expression (DBATE), a web-based repository storing expression values and functional annotation of alternative splicing variants. We processed 13 large RNA-seq panels from human healthy tissues and in disease conditions, reporting expression levels and functional annotations gathered and integrated from different sources for each splicing variant, using a variant-specific annotation transfer pipeline. The possibility to perform complex queries by cross-referencing different functional annotations permits the retrieval of desired subsets of splicing variant expression values that can be visualized in several ways, from simple to more informative. DBATE is intended as a novel tool to help appreciate how, and possibly why, the transcriptome expression is shaped. DATABASE URL: http://bioinformatica.uniroma2.it/DBATE/.
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases

PubMed Central

Buza, Teresia M.; Jack, Sherman W.; Kirunda, Halid; Khaitsa, Margaret L.; Lawrence, Mark L.; Pruett, Stephen; Peterson, Daniel G.

2015-01-01

There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu PMID:26581408
Investigating the Potential Impacts of Energy Production in the Marcellus Shale Region Using the Shale Network Database

NASA Astrophysics Data System (ADS)

Brantley, S.; Brazil, L.

2017-12-01

The Shale Network's extensive database of water quality observations enables educational experiences about the potential impacts of resource extraction with real data. Through tools that are open source and free to use, researchers, educators, and citizens can access and analyze the very same data that the Shale Network team has used in peer-reviewed publications about the potential impacts of hydraulic fracturing on water. The development of the Shale Network database has been made possible through efforts led by an academic team and involving numerous individuals from government agencies, citizen science organizations, and private industry. Thus far, these tools and data have been used to engage high school students, university undergraduate and graduate students, as well as citizens so that all can discover how energy production impacts the Marcellus Shale region, which includes Pennsylvania and other nearby states. This presentation will describe these data tools, how the Shale Network has used them in developing lesson plans, and the resources available to learn more.
Advanced Cyberinfrastructure for Geochronology as a Collaborative Endeavor: A Decade of Progress, A Decade of Plans

NASA Astrophysics Data System (ADS)

Bowring, J. F.; McLean, N. M.; Walker, J. D.; Gehrels, G. E.; Rubin, K. H.; Dutton, A.; Bowring, S. A.; Rioux, M. E.

2015-12-01

The Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) has worked collaboratively for the last decade with geochronologists from EARTHTIME and EarthChem to build cyberinfrastructure geared to ensuring transparency and reproducibility in geoscience workflows and is engaged in refining and extending that work to serve additional geochronology domains during the next decade. ET_Redux (formerly U-Pb_Redux) is a free open-source software system that provides end-to-end support for the analysis of U-Pb geochronological data. The system reduces raw mass spectrometer (TIMS and LA-ICPMS) data to U-Pb dates, allows users to interpret ages from these data, and then facilitates the seamless federation of the results from one or more labs into a community web-accessible database using standard and open techniques. This EarthChem database - GeoChron.org - depends on keyed references to the System for Earth Sample Registration (SESAR) database that stores metadata about registered samples. These keys are each a unique International Geo Sample Number (IGSN) assigned to a sample and to its derivatives. ET_Redux provides for interaction with this archive, allowing analysts to store, maintain, retrieve, and share their data and analytical results electronically with whomever they choose. This initiative has created an open standard for the data elements of a complete reduction and analysis of U-Pb data, and is currently working to complete the same for U-series geochronology. We have demonstrated the utility of interdisciplinary collaboration between computer scientists and geoscientists in achieving a working and useful system that provides transparency and supports reproducibility, allowing geochemists to focus on their specialties. The software engineering community also benefits by acquiring research opportunities to improve development process methodologies used in the design, implementation, and sustainability of domain-specific software.
a Webgis for the Knowledge and Conservation of the Historical Wall Structures of the 13TH-18TH Centuries

NASA Astrophysics Data System (ADS)

Vacca, G.; Pili, D.; Fiorino, D. R.; Pintus, V.

2017-05-01

The presented work is part of the research project, titled "Tecniche murarie tradizionali: conoscenza per la conservazione ed il miglioramento prestazionale" (Traditional building techniques: from knowledge to conservation and performance improvement), with the purpose of studying the building techniques of the 13th-18th centuries in the Sardinia Region (Italy) for their knowledge, conservation, and promotion. The end purpose of the entire study is to improve the performance of the examined structures. In particular, the task of the authors within the research project was to build a WebGIS to manage the data collected during the examination and study phases. This infrastructure was entirely built using Open Source software. The work consisted of designing a database built in PostgreSQL and its spatial extension PostGIS, which allows to store and manage feature geometries and spatial data. The data input is performed via a form built in HTML and PHP. The HTML part is based on Bootstrap, an open tools library for websites and web applications. The implementation of this template used both PHP and Javascript code. The PHP code manages the reading and writing of data to the database, using embedded SQL queries. As of today, we surveyed and archived more than 300 buildings, belonging to three main macro categories: fortification architectures, religious architectures, residential architectures. The masonry samples investigated in relation to the construction techniques are more than 150. The database is published on the Internet as a WebGIS built using the Leaflet Javascript open libraries, which allows creating map sites with background maps and navigation, input and query tools. This too uses an interaction of HTML, Javascript, PHP and SQL code.
DOE technology information management system database study report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widing, M.A.; Blodgett, D.W.; Braun, M.D.

1994-11-01

To support the missions of the US Department of Energy (DOE) Special Technologies Program, Argonne National Laboratory is defining the requirements for an automated software system that will search electronic databases on technology. This report examines the work done and results to date. Argonne studied existing commercial and government sources of technology databases in five general areas: on-line services, patent database sources, government sources, aerospace technology sources, and general technology sources. First, it conducted a preliminary investigation of these sources to obtain information on the content, cost, frequency of updates, and other aspects of their databases. The Laboratory then performedmore » detailed examinations of at least one source in each area. On this basis, Argonne recommended which databases should be incorporated in DOE`s Technology Information Management System.« less
Critical Care Health Informatics Collaborative (CCHIC): Data, tools and methods for reproducible research: A multi-centre UK intensive care database.

PubMed

Harris, Steve; Shi, Sinan; Brealey, David; MacCallum, Niall S; Denaxas, Spiros; Perez-Suarez, David; Ercole, Ari; Watkinson, Peter; Jones, Andrew; Ashworth, Simon; Beale, Richard; Young, Duncan; Brett, Stephen; Singer, Mervyn

2018-04-01

To build and curate a linkable multi-centre database of high resolution longitudinal electronic health records (EHR) from adult Intensive Care Units (ICU). To develop a set of open-source tools to make these data 'research ready' while protecting patient's privacy with a particular focus on anonymisation. We developed a scalable EHR processing pipeline for extracting, linking, normalising and curating and anonymising EHR data. Patient and public involvement was sought from the outset, and approval to hold these data was granted by the NHS Health Research Authority's Confidentiality Advisory Group (CAG). The data are held in a certified Data Safe Haven. We followed sustainable software development principles throughout, and defined and populated a common data model that links to other clinical areas. Longitudinal EHR data were loaded into the CCHIC database from eleven adult ICUs at 5 UK teaching hospitals. From January 2014 to January 2017, this amounted to 21,930 and admissions (18,074 unique patients). Typical admissions have 70 data-items pertaining to admission and discharge, and a median of 1030 (IQR 481-2335) time-varying measures. Training datasets were made available through virtual machine images emulating the data processing environment. An open source R package, cleanEHR, was developed and released that transforms the data into a square table readily analysable by most statistical packages. A simple language agnostic configuration file will allow the user to select and clean variables, and impute missing data. An audit trail makes clear the provenance of the data at all times. Making health care data available for research is problematic. CCHIC is a unique multi-centre longitudinal and linkable resource that prioritises patient privacy through the highest standards of data security, but also provides tools to clean, organise, and anonymise the data. We believe the development of such tools are essential if we are to meet the twin requirements of respecting patient privacy and working for patient benefit. The CCHIC database is now in use by health care researchers from academia and industry. The 'research ready' suite of data preparation tools have facilitated access, and linkage to national databases of secondary care is underway. Copyright © 2018 Elsevier B.V. All rights reserved.
An open experimental database for exploring inorganic materials

DOE PAGES

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; ...

2018-04-03

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
An open experimental database for exploring inorganic materials.

PubMed

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb

2018-04-03

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.
An open experimental database for exploring inorganic materials

PubMed Central

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus; Perkins, John D.; White, Robert; Munch, Kristin; Tumas, William; Phillips, Caleb

2018-01-01

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half of these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource. PMID:29611842
An open experimental database for exploring inorganic materials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zakutayev, Andriy; Wunder, Nick; Schwarting, Marcus

The use of advanced machine learning algorithms in experimental materials science is limited by the lack of sufficiently large and diverse datasets amenable to data mining. If publicly open, such data resources would also enable materials research by scientists without access to expensive experimental equipment. Here, we report on our progress towards a publicly open High Throughput Experimental Materials (HTEM) Database (htem.nrel.gov). This database currently contains 140,000 sample entries, characterized by structural (100,000), synthetic (80,000), chemical (70,000), and optoelectronic (50,000) properties of inorganic thin film materials, grouped in >4,000 sample entries across >100 materials systems; more than a half ofmore » these data are publicly available. This article shows how the HTEM database may enable scientists to explore materials by browsing web-based user interface and an application programming interface. This paper also describes a HTE approach to generating materials data, and discusses the laboratory information management system (LIMS), that underpin HTEM database. Finally, this manuscript illustrates how advanced machine learning algorithms can be adopted to materials science problems using this open data resource.« less
IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis

PubMed Central

Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A.

2015-01-01

The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. Availability and implementation: IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. Contact: ppevzner@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26072509
Using GIS databases for simulated nightlight imagery

NASA Astrophysics Data System (ADS)

Zollweg, Joshua D.; Gartley, Michael; Roskovensky, John; Mercier, Jeffery

2012-06-01

Proposed is a new technique for simulating nighttime scenes with realistically-modelled urban radiance. While nightlight imagery is commonly used to measure urban sprawl,1 it is uncommon to use urbanization as metric to develop synthetic nighttime scenes. In the developed methodology, the open-source Open Street Map (OSM) Geographic Information System (GIS) database is used. The database is comprised of many nodes, which are used to dene the position of dierent types of streets, buildings, and other features. These nodes are the driver used to model urban nightlights, given several assumptions. The rst assumption is that the spatial distribution of nodes is closely related to the spatial distribution of nightlights. Work by Roychowdhury et al has demonstrated the relationship between urban lights and development. 2 So, the real assumption being made is that the density of nodes corresponds to development, which is reasonable. Secondly, the local density of nodes must relate directly to the upwelled radiance within the given locality. Testing these assumptions using Albuquerque and Indianapolis as example cities revealed that dierent types of nodes produce more realistic results than others. Residential street nodes oered the best performance for any single node type, among the types tested in this investigation. Other node types, however, still provide useful supplementary data. Using streets and buildings dened in the OSM database allowed automated generation of simulated nighttime scenes of Albuquerque and Indianapolis in the Digital Imaging and Remote Sensing Image Generation (DIRSIG) model. The simulation was compared to real data from the recently deployed National Polar-orbiting Operational Environmental Satellite System(NPOESS) Visible Infrared Imager Radiometer Suite (VIIRS) platform. As a result of the comparison, correction functions were used to correct for discrepancies between simulated and observed radiance. Future work will include investigating more advanced approaches for mapping the spatial extent of nightlights, based on the distribution of dierent node types in local neighbourhoods. This will allow the spectral prole of each region to be dynamically adjusted, in addition to simply modifying the magnitude of a single source type.
Possible costs associated with investigating and mitigating geologic hazards in rural areas of western San Mateo County, California with a section on using the USGS website to determine the cost of developing property for residences in rural parts of San Mateo County, California

USGS Publications Warehouse

Brabb, Earl E.; Roberts, Sebastian; Cotton, William R.; Kropp, Alan L.; Wright, Robert H.; Zinn, Erik N.; Digital database by Roberts, Sebastian; Mills, Suzanne K.; Barnes, Jason B.; Marsolek, Joanna E.

2000-01-01

This publication consists of a digital map database on a geohazards web site, http://kaibab.wr.usgs.gov/geohazweb/intro.htm, this text, and 43 digital map images available for downloading at this site. The report is stored as several digital files, in ARC export (uncompressed) format for the database, and Postscript and PDF formats for the map images. Several of the source data layers for the images have already been released in other publications by the USGS and are available for downloading on the Internet. These source layers are not included in this digital database, but rather a reference is given for the web site where the data can be found in digital format. The exported ARC coverages and grids lie in UTM zone 10 projection. The pamphlet, which only describes the content and character of the digital map database, is included as Postscript, PDF, and ASCII text files and is also available on paper as USGS Open-File Report 00-127. The full versatility of the spatial database is realized by importing the ARC export files into ARC/INFO or an equivalent GIS. Other GIS packages, including MapInfo and ARCVIEW, can also use the ARC export files. The Postscript map image can be used for viewing or plotting in computer systems with sufficient capacity, and the considerably smaller PDF image files can be viewed or plotted in full or in part from Adobe ACROBAT software running on Macintosh, PC, or UNIX platforms.
MOSAIC: An organic geochemical and sedimentological database for marine surface sediments

NASA Astrophysics Data System (ADS)

Tavagna, Maria Luisa; Usman, Muhammed; De Avelar, Silvania; Eglinton, Timothy

2015-04-01

Modern ocean sediments serve as the interface between the biosphere and the geosphere, play a key role in biogeochemical cycles and provide a window on how contemporary processes are written into the sedimentary record. Research over past decades has resulted in a wealth of information on the content and composition of organic matter in marine sediments, with ever-more sophisticated techniques continuing to yield information of greater detail and as an accelerating pace. However, there has been no attempt to synthesize this wealth of information. We are establishing a new database that incorporates information relevant to local, regional and global-scale assessment of the content, source and fate of organic materials accumulating in contemporary marine sediments. In the MOSAIC (Modern Ocean Sediment Archive and Inventory of Carbon) database, particular emphasis is placed on molecular and isotopic information, coupled with relevant contextual information (e.g., sedimentological properties) relevant to elucidating factors that influence the efficiency and nature of organic matter burial. The main features of MOSAIC include: (i) Emphasis on continental margin sediments as major loci of carbon burial, and as the interface between terrestrial and oceanic realms; (ii) Bulk to molecular-level organic geochemical properties and parameters, including concentration and isotopic compositions; (iii) Inclusion of extensive contextual data regarding the depositional setting, in particular with respect to sedimentological and redox characteristics. The ultimate goal is to create an open-access instrument, available on the web, to be utilized for research and education by the international community who can both contribute to, and interrogate the database. The submission will be accomplished by means of a pre-configured table available on the MOSAIC webpage. The information on the filled tables will be checked and eventually imported, via the Structural Query Language (SQL), into MOSAIC. MOSAIC is programmed with PostgreSQL, an open-source database management system. In order to locate geographically the data, each element/datum is associated to a latitude, longitude and depth, facilitating creation of a geospatial database which can be easily interfaced to a Geographic Information System (GIS). In order to make the database broadly accessible, a HTML-PHP language-based website will ultimately be created and linked to the database. Consulting the website will allow for both data visualization as well as export of data in txt format for utilization with common software solutions (e.g. ODV, Excel, Matlab, Python, Word, PPT, Illustrator…). In this very early stage, MOSAIC presently contains approximately 10000 analyses conducted on more than 1800 samples which were collected from over 1600 different geographical locations around the world. Through participation of the international research community, MOSAIC will rapidly develop into a rich archive and versatile tool for investigation of distribution and composition of organic matter accumulating in seafloor sediments. The present contribution will outline the structure of MOSAIC, provide examples of data output, and solicit feedback on desirable features to be included in the database and associated software tools.
Volatile organic compounds (VOCs) source profiles of on-road vehicle emissions in China.

PubMed

Hong-Li, Wang; Sheng-Ao, Jing; Sheng-Rong, Lou; Qing-Yao, Hu; Li, Li; Shi-Kang, Tao; Cheng, Huang; Li-Ping, Qiao; Chang-Hong, Chen

2017-12-31

Volatile Organic Compounds (VOCs) source profiles of on-road vehicles were widely studied as their critical roles in VOCs source apportionment and abatement measures in megacities. Studies of VOCs source profiles from on-road motor vehicles from 2001 to 2016 were summarized in this study, with a focus on the comparisons among different studies and the potential impact of different factors. Generally, non-methane hydrocarbons dominated the source profile of on-road vehicle emissions. Carbonyls, potential important components of vehicle emission, were seldom considered in VOCs emissions of vehicles in the past and should be paid more attention to in further study. VOCs source profiles showed some variations among different studies, and 6 factors were extracted and studied due to their impact to VOCs source profile of on-road vehicles. Vehicle types, being dependent on engine types, and fuel types were two dominant factors impacting VOCs sources profiles of vehicles. In comparison, impacts of ignitions, driving conditions and accumulated mileage were mainly due to their influence on the combustion efficiency. An opening and interactive database of VOCs from vehicle emissions was critically essential in future, and mechanisms of sharing and inputting relative research results should be formed to encourage researchers join the database establishment. Correspondingly, detailed quality assurance and quality control procedures were also very important, which included the information of test vehicles and test methods as detailed as possible. Based on the community above, a better uncertainty analysis could be carried out for the VOCs emissions profiles, which was critically important to understand the VOCs emission characteristics of the vehicle emissions. Copyright © 2017 Elsevier B.V. All rights reserved.
Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects

PubMed Central

Hyam, Roger; Hagedorn, Gregor; Chagnoux, Simon; Röpert, Dominik; Casino, Ana; Droege, Gabi; Glöckler, Falko; Gödderz, Karsten; Groom, Quentin; Hoffmann, Jana; Holleman, Ayco; Kempa, Matúš; Koivula, Hanna; Marhold, Karol; Nicolson, Nicky; Smith, Vincent S.; Triebel, Dagmar

2017-01-01

With biodiversity research activities being increasingly shifted to the web, the need for a system of persistent and stable identifiers for physical collection objects becomes increasingly pressing. The Consortium of European Taxonomic Facilities agreed on a common system of HTTP-URI-based stable identifiers which is now rolled out to its member organizations. The system follows Linked Open Data principles and implements redirection mechanisms to human-readable and machine-readable representations of specimens facilitating seamless integration into the growing semantic web. The implementation of stable identifiers across collection organizations is supported with open source provider software scripts, best practices documentations and recommendations for RDF metadata elements facilitating harmonized access to collection information in web portals. Database URL: http://cetaf.org/cetaf-stable-identifiers PMID:28365724
The Open AUC Project.

PubMed

Cölfen, Helmut; Laue, Thomas M; Wohlleben, Wendel; Schilling, Kristian; Karabudak, Engin; Langhorst, Bradley W; Brookes, Emre; Dubbs, Bruce; Zollars, Dan; Rocco, Mattia; Demeler, Borries

2010-02-01

Progress in analytical ultracentrifugation (AUC) has been hindered by obstructions to hardware innovation and by software incompatibility. In this paper, we announce and outline the Open AUC Project. The goals of the Open AUC Project are to stimulate AUC innovation by improving instrumentation, detectors, acquisition and analysis software, and collaborative tools. These improvements are needed for the next generation of AUC-based research. The Open AUC Project combines on-going work from several different groups. A new base instrument is described, one that is designed from the ground up to be an analytical ultracentrifuge. This machine offers an open architecture, hardware standards, and application programming interfaces for detector developers. All software will use the GNU Public License to assure that intellectual property is available in open source format. The Open AUC strategy facilitates collaborations, encourages sharing, and eliminates the chronic impediments that have plagued AUC innovation for the last 20 years. This ultracentrifuge will be equipped with multiple and interchangeable optical tracks so that state-of-the-art electronics and improved detectors will be available for a variety of optical systems. The instrument will be complemented by a new rotor, enhanced data acquisition and analysis software, as well as collaboration software. Described here are the instrument, the modular software components, and a standardized database that will encourage and ease integration of data analysis and interpretation software.
Pyteomics--a Python framework for exploratory data analysis and rapid software prototyping in proteomics.

PubMed

Goloborodko, Anton A; Levitsky, Lev I; Ivanov, Mark V; Gorshkov, Mikhail V

2013-02-01

Pyteomics is a cross-platform, open-source Python library providing a rich set of tools for MS-based proteomics. It provides modules for reading LC-MS/MS data, search engine output, protein sequence databases, theoretical prediction of retention times, electrochemical properties of polypeptides, mass and m/z calculations, and sequence parsing. Pyteomics is available under Apache license; release versions are available at the Python Package Index http://pypi.python.org/pyteomics, the source code repository at http://hg.theorchromo.ru/pyteomics, documentation at http://packages.python.org/pyteomics. Pyteomics.biolccc documentation is available at http://packages.python.org/pyteomics.biolccc/. Questions on installation and usage can be addressed to pyteomics mailing list: pyteomics@googlegroups.com.
Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database

PubMed Central

Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M.; Brown, Eric W.; Timme, Ruth

2016-01-01

The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. PMID:27008877

mantisGRID: a grid platform for DICOM medical images management in Colombia and Latin America.

PubMed

Garcia Ruiz, Manuel; Garcia Chaves, Alvin; Ruiz Ibañez, Carlos; Gutierrez Mazo, Jorge Mario; Ramirez Giraldo, Juan Carlos; Pelaez Echavarria, Alejandro; Valencia Diaz, Edison; Pelaez Restrepo, Gustavo; Montoya Munera, Edwin Nelson; Garcia Loaiza, Bernardo; Gomez Gonzalez, Sebastian

2011-04-01

This paper presents the mantisGRID project, an interinstitutional initiative from Colombian medical and academic centers aiming to provide medical grid services for Colombia and Latin America. The mantisGRID is a GRID platform, based on open source grid infrastructure that provides the necessary services to access and exchange medical images and associated information following digital imaging and communications in medicine (DICOM) and health level 7 standards. The paper focuses first on the data abstraction architecture, which is achieved via Open Grid Services Architecture Data Access and Integration (OGSA-DAI) services and supported by the Globus Toolkit. The grid currently uses a 30-Mb bandwidth of the Colombian High Technology Academic Network, RENATA, connected to Internet 2. It also includes a discussion on the relational database created to handle the DICOM objects that were represented using Extensible Markup Language Schema documents, as well as other features implemented such as data security, user authentication, and patient confidentiality. Grid performance was tested using the three current operative nodes and the results demonstrated comparable query times between the mantisGRID (OGSA-DAI) and Distributed mySQL databases, especially for a large number of records.
Climate Signals: An On-Line Digital Platform for Mapping Climate Change Impacts in Real Time

NASA Astrophysics Data System (ADS)

Cutting, H.

2016-12-01

Climate Signals is an on-line digital platform for cataloging and mapping the impacts of climate change. The CS platform specifies and details the chains of connections between greenhouse gas emissions and individual climate events. Currently in open-beta release, the platform is designed to to engage and serve the general public, news media, and policy-makers, particularly in real-time during extreme climate events. Climate Signals consists of a curated relational database of events and their links to climate change, a mapping engine, and a gallery of climate change monitors offering real-time data. For each event in the database, an infographic engine provides a custom attribution "tree" that illustrates the connections to climate change. In addition, links to key contextual resources are aggregated and curated for each event. All event records are fully annotated with detailed source citations and corresponding hyper links. The system of attribution used to link events to climate change in real-time is detailed here. This open-beta release is offered for public user testing and engagement. Launched in May 2016, the operation of this platform offers lessons for public engagement in climate change impacts.
An annotated corpus with nanomedicine and pharmacokinetic parameters

PubMed Central

Lewinski, Nastassja A; Jimenez, Ivan; McInnes, Bridget T

2017-01-01

A vast amount of data on nanomedicines is being generated and published, and natural language processing (NLP) approaches can automate the extraction of unstructured text-based data. Annotated corpora are a key resource for NLP and information extraction methods which employ machine learning. Although corpora are available for pharmaceuticals, resources for nanomedicines and nanotechnology are still limited. To foster nanotechnology text mining (NanoNLP) efforts, we have constructed a corpus of annotated drug product inserts taken from the US Food and Drug Administration’s Drugs@FDA online database. In this work, we present the development of the Engineered Nanomedicine Database corpus to support the evaluation of nanomedicine entity extraction. The data were manually annotated for 21 entity mentions consisting of nanomedicine physicochemical characterization, exposure, and biologic response information of 41 Food and Drug Administration-approved nanomedicines. We evaluate the reliability of the manual annotations and demonstrate the use of the corpus by evaluating two state-of-the-art named entity extraction systems, OpenNLP and Stanford NER. The annotated corpus is available open source and, based on these results, guidelines and suggestions for future development of additional nanomedicine corpora are provided. PMID:29066897
Security of patient data when decommissioning ultrasound systems.

PubMed

Moggridge, James

2017-02-01

Although ultrasound systems generally archive to Picture Archiving and Communication Systems (PACS), their archiving workflow typically involves storage to an internal hard disk before data are transferred onwards. Deleting records from the local system will delete entries in the database and from the file allocation table or equivalent but, as with a PC, files can be recovered. Great care is taken with disposal of media from a healthcare organisation to prevent data breaches, but ultrasound systems are routinely returned to lease companies, sold on or donated to third parties without such controls. In this project, five methods of hard disk erasure were tested on nine ultrasound systems being decommissioned: the system's own delete function; full reinstallation of system software; the manufacturer's own disk wiping service; open source disk wiping software for full and just blank space erasure. Attempts were then made to recover data using open source recovery tools. All methods deleted patient data as viewable from the ultrasound system and from browsing the disk from a PC. However, patient identifiable data (PID) could be recovered following the system's own deletion and the reinstallation methods. No PID could be recovered after using the manufacturer's wiping service or the open source wiping software. The typical method of reinstalling an ultrasound system's software may not prevent PID from being recovered. When transferring ownership, care should be taken that an ultrasound system's hard disk has been wiped to a sufficient level, particularly if the scanner is to be returned with approved parts and in a fully working state.
Automatic analysis of online image data for law enforcement agencies by concept detection and instance search

NASA Astrophysics Data System (ADS)

de Boer, Maaike H. T.; Bouma, Henri; Kruithof, Maarten C.; ter Haar, Frank B.; Fischer, Noëlle M.; Hagendoorn, Laurens K.; Joosten, Bart; Raaijmakers, Stephan

2017-10-01

The information available on-line and off-line, from open as well as from private sources, is growing at an exponential rate and places an increasing demand on the limited resources of Law Enforcement Agencies (LEAs). The absence of appropriate tools and techniques to collect, process, and analyze the volumes of complex and heterogeneous data has created a severe information overload. If a solution is not found, the impact on law enforcement will be dramatic, e.g. because important evidence is missed or the investigation time is too long. Furthermore, there is an uneven level of capabilities to deal with the large volumes of complex and heterogeneous data that come from multiple open and private sources at national level across the EU, which hinders cooperation and information sharing. Consequently, there is a pertinent need to develop tools, systems and processes which expedite online investigations. In this paper, we describe a suite of analysis tools to identify and localize generic concepts, instances of objects and logos in images, which constitutes a significant portion of everyday law enforcement data. We describe how incremental learning based on only a few examples and large-scale indexing are addressed in both concept detection and instance search. Our search technology allows querying of the database by visual examples and by keywords. Our tools are packaged in a Docker container to guarantee easy deployment on a system and our tools exploit possibilities provided by open source toolboxes, contributing to the technical autonomy of LEAs.
PRIMED: PRIMEr Database for Deleting and Tagging All Fission and Budding Yeast Genes Developed Using the Open-Source Genome Retrieval Script (GRS)

PubMed Central

Cummings, Michael T.; Joh, Richard I.; Motamedi, Mo

2015-01-01

The fission (Schizosaccharomyces pombe) and budding (Saccharomyces cerevisiae) yeasts have served as excellent models for many seminal discoveries in eukaryotic biology. In these organisms, genes are deleted or tagged easily by transforming cells with PCR-generated DNA inserts, flanked by short (50-100bp) regions of gene homology. These PCR reactions use especially designed long primers, which, in addition to the priming sites, carry homology for gene targeting. Primer design follows a fixed method but is tedious and time-consuming especially when done for a large number of genes. To automate this process, we developed the Python-based Genome Retrieval Script (GRS), an easily customizable open-source script for genome analysis. Using GRS, we created PRIMED, the complete PRIMEr D atabase for deleting and C-terminal tagging genes in the main S. pombe and five of the most commonly used S. cerevisiae strains. Because of the importance of noncoding RNAs (ncRNAs) in many biological processes, we also included the deletion primer set for these features in each genome. PRIMED are accurate and comprehensive and are provided as downloadable Excel files, removing the need for future primer design, especially for large-scale functional analyses. Furthermore, the open-source GRS can be used broadly to retrieve genome information from custom or other annotated genomes, thus providing a suitable platform for building other genomic tools by the yeast or other research communities. PMID:25643023
Identification and evaluation of fluvial-dominated deltaic (Class 1 oil) reservoirs in Oklahoma. Yearly technical progress report, January 1--December 31, 1993

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mankin, C.J.; Banken, M.K.

The Oklahoma Geological Survey (OGS), the Geological Information Systems department, and the School of Petroleum and Geological Engineering at the University of Oklahoma are engaged in a five-year program to identify and address Oklahoma`s oil recovery opportunities in fluvial-dominated deltaic (FDD) reservoirs. This program includes the systematic and comprehensive collection, evaluation, and distribution of information on all of Oklahoma`s FDD oil reservoirs and the recovery technologies that can be applied to those reservoirs with commercial success. Exhaustive literature searches are being conducted for these plays, both through published sources and through unpublished theses from regional universities. A bibliographic database hasmore » been developed to record these literature sources and their related plays. Trend maps are being developed to identify the FDD portions of the relevant reservoirs, through accessing current production databases and through compiling the literature results. A reservoir database system also has been developed, to record specific reservoir data elements that are identified through the literature, and through public and private data sources. Thus far, the initial demonstration for one has been completed, and second is nearly completed. All of the information gathered through these efforts will be transferred to the Oklahoma petroleum industry through a series of publications and workshops. Additionally, plans are being developed, and hardware and software resources are being acquired, in preparation for the opening of a publicly-accessible computer users laboratory, one component of the technology transfer program.« less
Extension modules for storage, visualization and querying of genomic, genetic and breeding data in Tripal databases

PubMed Central

Lee, Taein; Cheng, Chun-Huai; Ficklin, Stephen; Yu, Jing; Humann, Jodi; Main, Dorrie

2017-01-01

Abstract Tripal is an open-source database platform primarily used for development of genomic, genetic and breeding databases. We report here on the release of the Chado Loader, Chado Data Display and Chado Search modules to extend the functionality of the core Tripal modules. These new extension modules provide additional tools for (1) data loading, (2) customized visualization and (3) advanced search functions for supported data types such as organism, marker, QTL/Mendelian Trait Loci, germplasm, map, project, phenotype, genotype and their respective metadata. The Chado Loader module provides data collection templates in Excel with defined metadata and data loaders with front end forms. The Chado Data Display module contains tools to visualize each data type and the metadata which can be used as is or customized as desired. The Chado Search module provides search and download functionality for the supported data types. Also included are the tools to visualize map and species summary. The use of materialized views in the Chado Search module enables better performance as well as flexibility of data modeling in Chado, allowing existing Tripal databases with different metadata types to utilize the module. These Tripal Extension modules are implemented in the Genome Database for Rosaceae (rosaceae.org), CottonGen (cottongen.org), Citrus Genome Database (citrusgenomedb.org), Genome Database for Vaccinium (vaccinium.org) and the Cool Season Food Legume Database (coolseasonfoodlegume.org). Database URL: https://www.citrusgenomedb.org/, https://www.coolseasonfoodlegume.org/, https://www.cottongen.org/, https://www.rosaceae.org/, https://www.vaccinium.org/
An integrated photogrammetric and spatial database management system for producing fully structured data using aerial and remote sensing images.

PubMed

Ahmadi, Farshid Farnood; Ebadi, Hamid

2009-01-01

3D spatial data acquired from aerial and remote sensing images by photogrammetric techniques is one of the most accurate and economic data sources for GIS, map production, and spatial data updating. However, there are still many problems concerning storage, structuring and appropriate management of spatial data obtained using these techniques. According to the capabilities of spatial database management systems (SDBMSs); direct integration of photogrammetric and spatial database management systems can save time and cost of producing and updating digital maps. This integration is accomplished by replacing digital maps with a single spatial database. Applying spatial databases overcomes the problem of managing spatial and attributes data in a coupled approach. This management approach is one of the main problems in GISs for using map products of photogrammetric workstations. Also by the means of these integrated systems, providing structured spatial data, based on OGC (Open GIS Consortium) standards and topological relations between different feature classes, is possible at the time of feature digitizing process. In this paper, the integration of photogrammetric systems and SDBMSs is evaluated. Then, different levels of integration are described. Finally design, implementation and test of a software package called Integrated Photogrammetric and Oracle Spatial Systems (IPOSS) is presented.
Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes.

PubMed

Winsor, Geoffrey L; Van Rossum, Thea; Lo, Raymond; Khaira, Bhavjinder; Whiteside, Matthew D; Hancock, Robert E W; Brinkman, Fiona S L

2009-01-01

Pseudomonas aeruginosa is a well-studied opportunistic pathogen that is particularly known for its intrinsic antimicrobial resistance, diverse metabolic capacity, and its ability to cause life threatening infections in cystic fibrosis patients. The Pseudomonas Genome Database (http://www.pseudomonas.com) was originally developed as a resource for peer-reviewed, continually updated annotation for the Pseudomonas aeruginosa PAO1 reference strain genome. In order to facilitate cross-strain and cross-species genome comparisons with other Pseudomonas species of importance, we have now expanded the database capabilities to include all Pseudomonas species, and have developed or incorporated methods to facilitate high quality comparative genomics. The database contains robust assessment of orthologs, a novel ortholog clustering method, and incorporates five views of the data at the sequence and annotation levels (Gbrowse, Mauve and custom views) to facilitate genome comparisons. A choice of simple and more flexible user-friendly Boolean search features allows researchers to search and compare annotations or sequences within or between genomes. Other features include more accurate protein subcellular localization predictions and a user-friendly, Boolean searchable log file of updates for the reference strain PAO1. This database aims to continue to provide a high quality, annotated genome resource for the research community and is available under an open source license.
Plant Reactome: a resource for plant pathways and comparative analysis.

PubMed

Naithani, Sushma; Preece, Justin; D'Eustachio, Peter; Gupta, Parul; Amarasinghe, Vindhya; Dharmawardhana, Palitha D; Wu, Guanming; Fabregat, Antonio; Elser, Justin L; Weiser, Joel; Keays, Maria; Fuentes, Alfonso Munoz-Pomer; Petryszak, Robert; Stein, Lincoln D; Ware, Doreen; Jaiswal, Pankaj

2017-01-04

Plant Reactome (http://plantreactome.gramene.org/) is a free, open-source, curated plant pathway database portal, provided as part of the Gramene project. The database provides intuitive bioinformatics tools for the visualization, analysis and interpretation of pathway knowledge to support genome annotation, genome analysis, modeling, systems biology, basic research and education. Plant Reactome employs the structural framework of a plant cell to show metabolic, transport, genetic, developmental and signaling pathways. We manually curate molecular details of pathways in these domains for reference species Oryza sativa (rice) supported by published literature and annotation of well-characterized genes. Two hundred twenty-two rice pathways, 1025 reactions associated with 1173 proteins, 907 small molecules and 256 literature references have been curated to date. These reference annotations were used to project pathways for 62 model, crop and evolutionarily significant plant species based on gene homology. Database users can search and browse various components of the database, visualize curated baseline expression of pathway-associated genes provided by the Expression Atlas and upload and analyze their Omics datasets. The database also offers data access via Application Programming Interfaces (APIs) and in various standardized pathway formats, such as SBML and BioPAX. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Toward a mtDNA locus-specific mutation database using the LOVD platform.

PubMed

Elson, Joanna L; Sweeney, Mary G; Procaccio, Vincent; Yarham, John W; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H; Pitceathly, Robert D S; Thorburn, David R; Lott, Marie T; Wallace, Douglas C; Taylor, Robert W; McFarland, Robert

2012-09-01

The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. © 2012 Wiley Periodicals, Inc.
Toward a mtDNA Locus-Specific Mutation Database Using the LOVD Platform

PubMed Central

Elson, Joanna L.; Sweeney, Mary G.; Procaccio, Vincent; Yarham, John W.; Salas, Antonio; Kong, Qing-Peng; van der Westhuizen, Francois H.; Pitceathly, Robert D.S.; Thorburn, David R.; Lott, Marie T.; Wallace, Douglas C.; Taylor, Robert W.; McFarland, Robert

2015-01-01

The Human Variome Project (HVP) is a global effort to collect and curate all human genetic variation affecting health. Mutations of mitochondrial DNA (mtDNA) are an important cause of neurogenetic disease in humans; however, identification of the pathogenic mutations responsible can be problematic. In this article, we provide explanations as to why and suggest how such difficulties might be overcome. We put forward a case in support of a new Locus Specific Mutation Database (LSDB) implemented using the Leiden Open-source Variation Database (LOVD) system that will not only list primary mutations, but also present the evidence supporting their role in disease. Critically, we feel that this new database should have the capacity to store information on the observed phenotypes alongside the genetic variation, thereby facilitating our understanding of the complex and variable presentation of mtDNA disease. LOVD supports fast queries of both seen and hidden data and allows storage of sequence variants from high-throughput sequence analysis. The LOVD platform will allow construction of a secure mtDNA database; one that can fully utilize currently available data, as well as that being generated by high-throughput sequencing, to link genotype with phenotype enhancing our understanding of mitochondrial disease, with a view to providing better prognostic information. PMID:22581690
Open-access databases as unprecedented resources and drivers of cultural change in fisheries science

DOE Office of Scientific and Technical Information (OSTI.GOV)

McManamay, Ryan A; Utz, Ryan

2014-01-01

Open-access databases with utility in fisheries science have grown exponentially in quantity and scope over the past decade, with profound impacts to our discipline. The management, distillation, and sharing of an exponentially growing stream of open-access data represents several fundamental challenges in fisheries science. Many of the currently available open-access resources may not be universally known among fisheries scientists. We therefore introduce many national- and global-scale open-access databases with applications in fisheries science and provide an example of how they can be harnessed to perform valuable analyses without additional field efforts. We also discuss how the development, maintenance, and utilizationmore » of open-access data are likely to pose technical, financial, and educational challenges to fisheries scientists. Such cultural implications that will coincide with the rapidly increasing availability of free data should compel the American Fisheries Society to actively address these problems now to help ease the forthcoming cultural transition.« less
Enabling systematic, harmonised and large-scale biofilms data computation: the Biofilms Experiment Workbench.

PubMed

Pérez-Rodríguez, Gael; Glez-Peña, Daniel; Azevedo, Nuno F; Pereira, Maria Olívia; Fdez-Riverola, Florentino; Lourenço, Anália

2015-03-01

Biofilms are receiving increasing attention from the biomedical community. Biofilm-like growth within human body is considered one of the key microbial strategies to augment resistance and persistence during infectious processes. The Biofilms Experiment Workbench is a novel software workbench for the operation and analysis of biofilms experimental data. The goal is to promote the interchange and comparison of data among laboratories, providing systematic, harmonised and large-scale data computation. The workbench was developed with AIBench, an open-source Java desktop application framework for scientific software development in the domain of translational biomedicine. Implementation favours free and open-source third-parties, such as the R statistical package, and reaches for the Web services of the BiofOmics database to enable public experiment deposition. First, we summarise the novel, free, open, XML-based interchange format for encoding biofilms experimental data. Then, we describe the execution of common scenarios of operation with the new workbench, such as the creation of new experiments, the importation of data from Excel spreadsheets, the computation of analytical results, the on-demand and highly customised construction of Web publishable reports, and the comparison of results between laboratories. A considerable and varied amount of biofilms data is being generated, and there is a critical need to develop bioinformatics tools that expedite the interchange and comparison of microbiological and clinical results among laboratories. We propose a simple, open-source software infrastructure which is effective, extensible and easy to understand. The workbench is freely available for non-commercial use at http://sing.ei.uvigo.es/bew under LGPL license. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
The Bioperl Toolkit: Perl Modules for the Life Sciences

PubMed Central

Stajich, Jason E.; Block, David; Boulez, Kris; Brenner, Steven E.; Chervitz, Stephen A.; Dagdigian, Chris; Fuellen, Georg; Gilbert, James G.R.; Korf, Ian; Lapp, Hilmar; Lehväslaiho, Heikki; Matsalla, Chad; Mungall, Chris J.; Osborne, Brian I.; Pocock, Matthew R.; Schattner, Peter; Senger, Martin; Stein, Lincoln D.; Stupka, Elia; Wilkinson, Mark D.; Birney, Ewan

2002-01-01

The Bioperl project is an international open-source collaboration of biologists, bioinformaticians, and computer scientists that has evolved over the past 7 yr into the most comprehensive library of Perl modules available for managing and manipulating life-science information. Bioperl provides an easy-to-use, stable, and consistent programming interface for bioinformatics application programmers. The Bioperl modules have been successfully and repeatedly used to reduce otherwise complex tasks to only a few lines of code. The Bioperl object model has been proven to be flexible enough to support enterprise-level applications such as EnsEMBL, while maintaining an easy learning curve for novice Perl programmers. Bioperl is capable of executing analyses and processing results from programs such as BLAST, ClustalW, or the EMBOSS suite. Interoperation with modules written in Python and Java is supported through the evolving BioCORBA bridge. Bioperl provides access to data stores such as GenBank and SwissProt via a flexible series of sequence input/output modules, and to the emerging common sequence data storage format of the Open Bioinformatics Database Access project. This study describes the overall architecture of the toolkit, the problem domains that it addresses, and gives specific examples of how the toolkit can be used to solve common life-sciences problems. We conclude with a discussion of how the open-source nature of the project has contributed to the development effort. [Supplemental material is available online at www.genome.org. Bioperl is available as open-source software free of charge and is licensed under the Perl Artistic License (http://www.perl.com/pub/a/language/misc/Artistic.html). It is available for download at http://www.bioperl.org. Support inquiries should be addressed to bioperl-l@bioperl.org.] PMID:12368254
A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management.

PubMed

Neinstein, Aaron; Wong, Jenise; Look, Howard; Arbiter, Brandon; Quirk, Kent; McCanne, Steve; Sun, Yao; Blum, Michael; Adi, Saleh

2016-03-01

Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application ("app"), Blip, to visualize the data. Tidepool's software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool's open source, cloud model for health data interoperability is applicable to other healthcare use cases. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
A case study in open source innovation: developing the Tidepool Platform for interoperability in type 1 diabetes management

PubMed Central

Wong, Jenise; Look, Howard; Arbiter, Brandon; Quirk, Kent; McCanne, Steve; Sun, Yao; Blum, Michael; Adi, Saleh

2016-01-01

Objective Develop a device-agnostic cloud platform to host diabetes device data and catalyze an ecosystem of software innovation for type 1 diabetes (T1D) management. Materials and Methods An interdisciplinary team decided to establish a nonprofit company, Tidepool, and build open-source software. Results Through a user-centered design process, the authors created a software platform, the Tidepool Platform, to upload and host T1D device data in an integrated, device-agnostic fashion, as well as an application (“app”), Blip, to visualize the data. Tidepool’s software utilizes the principles of modular components, modern web design including REST APIs and JavaScript, cloud computing, agile development methodology, and robust privacy and security. Discussion By consolidating the currently scattered and siloed T1D device data ecosystem into one open platform, Tidepool can improve access to the data and enable new possibilities and efficiencies in T1D clinical care and research. The Tidepool Platform decouples diabetes apps from diabetes devices, allowing software developers to build innovative apps without requiring them to design a unique back-end (e.g., database and security) or unique ways of ingesting device data. It allows people with T1D to choose to use any preferred app regardless of which device(s) they use. Conclusion The authors believe that the Tidepool Platform can solve two current problems in the T1D device landscape: 1) limited access to T1D device data and 2) poor interoperability of data from different devices. If proven effective, Tidepool’s open source, cloud model for health data interoperability is applicable to other healthcare use cases. PMID:26338218
TU-F-CAMPUS-I-05: Semi-Automated, Open Source MRI Quality Assurance and Quality Control Program for Multi-Unit Institution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yung, J; Stefan, W; Reeve, D

2015-06-15

Purpose: Phantom measurements allow for the performance of magnetic resonance (MR) systems to be evaluated. Association of Physicists in Medicine (AAPM) Report No. 100 Acceptance Testing and Quality Assurance Procedures for MR Imaging Facilities, American College of Radiology (ACR) MR Accreditation Program MR phantom testing, and ACR MRI quality control (QC) program documents help to outline specific tests for establishing system performance baselines as well as system stability over time. Analyzing and processing tests from multiple systems can be time-consuming for medical physicists. Besides determining whether tests are within predetermined limits or criteria, monitoring longitudinal trends can also help preventmore » costly downtime of systems during clinical operation. In this work, a semi-automated QC program was developed to analyze and record measurements in a database that allowed for easy access to historical data. Methods: Image analysis was performed on 27 different MR systems of 1.5T and 3.0T field strengths from GE and Siemens manufacturers. Recommended measurements involved the ACR MRI Accreditation Phantom, spherical homogenous phantoms, and a phantom with an uniform hole pattern. Measurements assessed geometric accuracy and linearity, position accuracy, image uniformity, signal, noise, ghosting, transmit gain, center frequency, and magnetic field drift. The program was designed with open source tools, employing Linux, Apache, MySQL database and Python programming language for the front and backend. Results: Processing time for each image is <2 seconds. Figures are produced to show regions of interests (ROIs) for analysis. Historical data can be reviewed to compare previous year data and to inspect for trends. Conclusion: A MRI quality assurance and QC program is necessary for maintaining high quality, ACR MRI Accredited MR programs. A reviewable database of phantom measurements assists medical physicists with processing and monitoring of large datasets. Longitudinal data can reveal trends that although are within passing criteria indicate underlying system issues.« less
WE-E-BRB-11: Riview a Web-Based Viewer for Radiotherapy.

PubMed

Apte, A; Wang, Y; Deasy, J

2012-06-01

Collaborations involving radiotherapy data collection, such as the recently proposed international radiogenomics consortium, require robust, web-based tools to facilitate reviewing treatment planning information. We present the architecture and prototype characteristics for a web-based radiotherapy viewer. The web-based environment developed in this work consists of the following components: 1) Import of DICOM/RTOG data: CERR was leveraged to import DICOM/RTOG data and to convert to database friendly RT objects. 2) Extraction and Storage of RT objects: The scan and dose distributions were stored as .png files per slice and view plane. The file locations were written to the MySQL database. Structure contours and DVH curves were written to the database as numeric data. 3) Web interfaces to query, retrieve and visualize the RT objects: The Web application was developed using HTML 5 and Ruby on Rails (RoR) technology following the MVC philosophy. The open source ImageMagick library was utilized to overlay scan, dose and structures. The application allows users to (i) QA the treatment plans associated with a study, (ii) Query and Retrieve patients matching anonymized ID and study, (iii) Review up to 4 plans simultaneously in 4 window panes (iv) Plot DVH curves for the selected structures and dose distributions. A subset of data for lung cancer patients was used to prototype the system. Five user accounts were created to have access to this study. The scans, doses, structures and DVHs for 10 patients were made available via the web application. A web-based system to facilitate QA, and support Query, Retrieve and the Visualization of RT data was prototyped. The RIVIEW system was developed using open source and free technology like MySQL and RoR. We plan to extend the RIVIEW system further to be useful in clinical trial data collection, outcomes research, cohort plan review and evaluation. © 2012 American Association of Physicists in Medicine.

Collaborative data model and data base development for paleoenvironmental and archaeological domain using Semantic MediaWiki

NASA Astrophysics Data System (ADS)

Willmes, C.

2017-12-01

In the frame of the Collaborative Research Centre 806 (CRC 806) an interdisciplinary research project, that needs to manage data, information and knowledge from heterogeneous domains, such as archeology, cultural sciences, and the geosciences, a collaborative internal knowledge base system was developed. The system is based on the open source MediaWiki software, that is well known as the software that enables Wikipedia, for its facilitation of a web based collaborative knowledge and information management platform. This software is additionally enhanced with the Semantic MediaWiki (SMW) extension, that allows to store and manage structural data within the Wiki platform, as well as it facilitates complex query and API interfaces to the structured data stored in the SMW data base. Using an additional open source software called mobo, it is possible to improve the data model development process, as well as automated data imports, from small spreadsheets to large relational databases. Mobo is a command line tool that helps building and deploying SMW structure in an agile, Schema-Driven Development way, and allows to manage and collaboratively develop the data model formalizations, that are formalized in JSON-Schema format, using version control systems like git. The combination of a well equipped collaborative web platform facilitated by Mediawiki, the possibility to store and query structured data in this collaborative database provided by SMW, as well as the possibility for automated data import and data model development enabled by mobo, result in a powerful but flexible system to build and develop a collaborative knowledge base system. Furthermore, SMW allows the application of Semantic Web technology, the structured data can be exported into RDF, thus it is possible to set a triple-store including a SPARQL endpoint on top of the database. The JSON-Schema based data models, can be enhanced into JSON-LD, to facilitate and profit from the possibilities of Linked Data technology.
ToxicDocs (www.ToxicDocs.org): from history buried in stacks of paper to open, searchable archives online.

PubMed

Rosner, David; Markowitz, Gerald; Chowkwanyun, Merlin

2018-02-01

As a result of a legal mechanism called discovery, the authors accumulated millions of internal corporate and trade association documents related to the introduction of new products and chemicals into workplaces and commerce. What did these private entities discuss among themselves and with their experts? The plethora of documents, both a blessing and a curse, opened new sources and interesting questions about corporate and regulatory histories. But they also posed an almost insurmountable challenge to historians. Thus emerged ToxicDocs, possible only with a technological innovation known as "Big Data." That refers to the sheer volume of new digital data and to the computational power to analyze them. Users will be able to identify what firms knew (or did not know) about the dangers of toxic substances in their products-and when. The database opens many areas to inquiry including environmental studies, business history, government regulation, and public policy. ToxicDocs will remain a resource free and open to all, anywhere in the world.
Intercomparison of open-path trace gas measurements with two dual-frequency-comb spectrometers

DOE PAGES

Waxman, Eleanor M.; Cossel, Kevin C.; Truong, Gar-Wing; ...

2017-09-11

We present the first quantitative intercomparison between two open-path dual-comb spectroscopy (DCS) instruments which were operated across adjacent 2 km open-air paths over a 2-week period. We used DCS to measure the atmospheric absorption spectrum in the near infrared from 6023 to 6376 cm −1 (1568 to 1660 nm), corresponding to a 355 cm −1 bandwidth, at 0.0067 cm −1 sample spacing. The measured absorption spectra agree with each other to within 5 × 10 −4 in absorbance without any external calibration of either instrument. The absorption spectra are fit to retrieve path-integrated concentrations for carbon dioxide (CO 2), methane (CH 4), water (H 2O), and deuteratedmore » water (HDO). The retrieved dry mole fractions agree to 0.14 % (0.57 ppm) for CO 2, 0.35 % (7 ppb) for CH 4, and 0.40 % (36 ppm) for H 2O at ∼  30 s integration time over the 2-week measurement campaign, which included 24 °C outdoor temperature variations and periods of strong atmospheric turbulence. This agreement is at least an order of magnitude better than conventional active-source open-path instrument intercomparisons and is particularly relevant to future regional flux measurements as it allows accurate comparisons of open-path DCS data across locations and time. We additionally compare the open-path DCS retrievals to a World Meteorological Organization (WMO)-calibrated cavity ring-down point sensor located along the path with good agreement. Short-term and long-term differences between the open-path DCS and point sensor are attributed, respectively, to spatial sampling discrepancies and to inaccuracies in the current spectral database used to fit the DCS data. Finally, the 2-week measurement campaign yields diurnal cycles of CO 2 and CH 4 that are consistent with the presence of local sources of CO 2 and absence of local sources of CH 4.« less
Intercomparison of open-path trace gas measurements with two dual-frequency-comb spectrometers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waxman, Eleanor M.; Cossel, Kevin C.; Truong, Gar-Wing

We present the first quantitative intercomparison between two open-path dual-comb spectroscopy (DCS) instruments which were operated across adjacent 2 km open-air paths over a 2-week period. We used DCS to measure the atmospheric absorption spectrum in the near infrared from 6023 to 6376 cm −1 (1568 to 1660 nm), corresponding to a 355 cm −1 bandwidth, at 0.0067 cm −1 sample spacing. The measured absorption spectra agree with each other to within 5 × 10 −4 in absorbance without any external calibration of either instrument. The absorption spectra are fit to retrieve path-integrated concentrations for carbon dioxide (CO 2), methane (CH 4), water (H 2O), and deuteratedmore » water (HDO). The retrieved dry mole fractions agree to 0.14 % (0.57 ppm) for CO 2, 0.35 % (7 ppb) for CH 4, and 0.40 % (36 ppm) for H 2O at ∼  30 s integration time over the 2-week measurement campaign, which included 24 °C outdoor temperature variations and periods of strong atmospheric turbulence. This agreement is at least an order of magnitude better than conventional active-source open-path instrument intercomparisons and is particularly relevant to future regional flux measurements as it allows accurate comparisons of open-path DCS data across locations and time. We additionally compare the open-path DCS retrievals to a World Meteorological Organization (WMO)-calibrated cavity ring-down point sensor located along the path with good agreement. Short-term and long-term differences between the open-path DCS and point sensor are attributed, respectively, to spatial sampling discrepancies and to inaccuracies in the current spectral database used to fit the DCS data. Finally, the 2-week measurement campaign yields diurnal cycles of CO 2 and CH 4 that are consistent with the presence of local sources of CO 2 and absence of local sources of CH 4.« less
Acquire: an open-source comprehensive cancer biobanking system.

PubMed

Dowst, Heidi; Pew, Benjamin; Watkins, Chris; McOwiti, Apollo; Barney, Jonathan; Qu, Shijing; Becnel, Lauren B

2015-05-15

The probability of effective treatment of cancer with a targeted therapeutic can be improved for patients with defined genotypes containing actionable mutations. To this end, many human cancer biobanks are integrating more tightly with genomic sequencing facilities and with those creating and maintaining patient-derived xenografts (PDX) and cell lines to provide renewable resources for translational research. To support the complex data management needs and workflows of several such biobanks, we developed Acquire. It is a robust, secure, web-based, database-backed open-source system that supports all major needs of a modern cancer biobank. Its modules allow for i) up-to-the-minute 'scoreboard' and graphical reporting of collections; ii) end user roles and permissions; iii) specimen inventory through caTissue Suite; iv) shipping forms for distribution of specimens to pathology, genomic analysis and PDX/cell line creation facilities; v) robust ad hoc querying; vi) molecular and cellular quality control metrics to track specimens' progress and quality; vii) public researcher request; viii) resource allocation committee distribution request review and oversight and ix) linkage to available derivatives of specimen. © The Author 2015. Published by Oxford University Press.
Role of Open Source Tools and Resources in Virtual Screening for Drug Discovery.

PubMed

Karthikeyan, Muthukumarasamy; Vyas, Renu

2015-01-01

Advancement in chemoinformatics research in parallel with availability of high performance computing platform has made handling of large scale multi-dimensional scientific data for high throughput drug discovery easier. In this study we have explored publicly available molecular databases with the help of open-source based integrated in-house molecular informatics tools for virtual screening. The virtual screening literature for past decade has been extensively investigated and thoroughly analyzed to reveal interesting patterns with respect to the drug, target, scaffold and disease space. The review also focuses on the integrated chemoinformatics tools that are capable of harvesting chemical data from textual literature information and transform them into truly computable chemical structures, identification of unique fragments and scaffolds from a class of compounds, automatic generation of focused virtual libraries, computation of molecular descriptors for structure-activity relationship studies, application of conventional filters used in lead discovery along with in-house developed exhaustive PTC (Pharmacophore, Toxicophores and Chemophores) filters and machine learning tools for the design of potential disease specific inhibitors. A case study on kinase inhibitors is provided as an example.
Installation of the National Transport Code Collaboration Data Server at the ITPA International Multi-tokamak Confinement Profile Database

NASA Astrophysics Data System (ADS)

Roach, Colin; Carlsson, Johan; Cary, John R.; Alexander, David A.

2002-11-01

The National Transport Code Collaboration (NTCC) has developed an array of software, including a data client/server. The data server, which is written in C++, serves local data (in the ITER Profile Database format) as well as remote data (by accessing one or several MDS+ servers). The client, a web-invocable Java applet, provides a uniform, intuitive, user-friendly, graphical interface to the data server. The uniformity of the interface relieves the user from the trouble of mastering the differences between different data formats and lets him/her focus on the essentials: plotting and viewing the data. The user runs the client by visiting a web page using any Java capable Web browser. The client is automatically downloaded and run by the browser. A reference to the data server is then retrieved via the standard Web protocol (HTTP). The communication between the client and the server is then handled by the mature, industry-standard CORBA middleware. CORBA has bindings for all common languages and many high-quality implementations are available (both Open Source and commercial). The NTCC data server has been installed at the ITPA International Multi-tokamak Confinement Profile Database, which is hosted by the UKAEA at Culham Science Centre. The installation of the data server is protected by an Internet firewall. To make it accessible to clients outside the firewall some modifications of the server were required. The working version of the ITPA confinement profile database is not open to the public. Authentification of legitimate users is done utilizing built-in Java security features to demand a password to download the client. We present an overview of the NTCC data client/server and some details of how the CORBA firewall-traversal issues were resolved and how the user authentification is implemented.
DamaGIS: a multisource geodatabase for collection of flood-related damage data

NASA Astrophysics Data System (ADS)

Saint-Martin, Clotilde; Javelle, Pierre; Vinet, Freddy

2018-06-01

Every year in France, recurring flood events result in several million euros of damage, and reducing the heavy consequences of floods has become a high priority. However, actions to reduce the impact of floods are often hindered by the lack of damage data on past flood events. The present paper introduces a new database for collection and assessment of flood-related damage. The DamaGIS database offers an innovative bottom-up approach to gather and identify damage data from multiple sources, including new media. The study area has been defined as the south of France considering the high frequency of floods over the past years. This paper presents the structure and contents of the database. It also presents operating instructions in order to keep collecting damage data within the database. This paper also describes an easily reproducible method to assess the severity of flood damage regardless of the location or date of occurrence. A first analysis of the damage contents is also provided in order to assess data quality and the relevance of the database. According to this analysis, despite its lack of comprehensiveness, the DamaGIS database presents many advantages. Indeed, DamaGIS provides a high accuracy of data as well as simplicity of use. It also has the additional benefit of being accessible in multiple formats and is open access. The DamaGIS database is available at https://doi.org/10.5281/zenodo.1241089.
A 868MHz-based wireless sensor network for ground truthing of soil moisture for a hyperspectral remote sensing campaign - design and preliminary results

NASA Astrophysics Data System (ADS)

Näthe, Paul; Becker, Rolf

2014-05-01

Soil moisture and plant available water are important environmental parameters that affect plant growth and crop yield. Hence, they are significant parameters for vegetation monitoring and precision agriculture. However, validation through ground-based soil moisture measurements is necessary for accessing soil moisture, plant canopy temperature, soil temperature and soil roughness with airborne hyperspectral imaging systems in a corresponding hyperspectral imaging campaign as a part of the INTERREG IV A-Project SMART INSPECTORS. At this point, commercially available sensors for matric potential, plant available water and volumetric water content are utilized for automated measurements with smart sensor nodes which are developed on the basis of open-source 868MHz radio modules, featuring a full-scale microcontroller unit that allows an autarkic operation of the sensor nodes on batteries in the field. The generated data from each of these sensor nodes is transferred wirelessly with an open-source protocol to a central node, the so-called "gateway". This gateway collects, interprets and buffers the sensor readings and, eventually, pushes the data-time series onto a server-based database. The entire data processing chain from the sensor reading to the final storage of data-time series on a server is realized with open-source hardware and software in such a way that the recorded data can be accessed from anywhere through the internet. It will be presented how this open-source based wireless sensor network is developed and specified for the application of ground truthing. In addition, the system's perspectives and potentials with respect to usability and applicability for vegetation monitoring and precision agriculture shall be pointed out. Regarding the corresponding hyperspectral imaging campaign, results from ground measurements will be discussed in terms of their contributing aspects to the remote sensing system. Finally, the significance of the wireless sensor network for the application of ground truthing shall be determined.
Open source hardware solutions for low-cost, do-it-yourself environmental monitoring, citizen science, and STEM education

NASA Astrophysics Data System (ADS)

Hicks, S. D.; Aufdenkampe, A. K.; Horsburgh, J. S.; Arscott, D. B.; Muenz, T.; Bressler, D. W.

2016-12-01

The explosion in DIY open-source hardware and software has resulted in the development of affordable and accessible technologies, like drones and weather stations, that can greatly assist the general public in monitoring environmental health and its degradation. It is widely recognized that education and support of audiences in pursuit of STEM literacy and the application of emerging technologies is a challenge for the future of citizen science and for preparing high school graduates to be actively engaged in environmental stewardship. It is also clear that detecting environmental change/degradation over time and space will be greatly enhanced with expanded use of networked, remote monitoring technologies by watershed organizations and citizen scientists if data collection and reporting are properly carried out and curated. However, there are few focused efforts to link citizen scientists and school programs with these emerging tools. We have started a multi-year program to develop hardware and teaching materials for training students and citizen scientists about the use of open source hardware in environmental monitoring. Scientists and educators around the world have started building their own dataloggers and devices using a variety of boards based on open source electronics. This new hardware is now providing researchers with an inexpensive alternative to commercial data logging and transmission hardware. We will present a variety of hardware solutions using the Arduino-compatible EnviroDIY Mayfly board (http://envirodiy.org/mayfly) that can be used to build and deploy a rugged environmental monitoring station using a wide variety of sensors and options, giving the users a fully customizable device for making measurements almost anywhere. A database and visualization system is being developed that will allow the users to view and manage the data their devices are collecting. We will also present our plan for developing curricula and leading workshops to various school programs and citizen scientist groups to teach them how to build, deploy, and maintain their own environmental monitoring stations.
HELIOS-R: An Ultrafast, Open-Source Retrieval Code For Exoplanetary Atmosphere Characterization

NASA Astrophysics Data System (ADS)

LAVIE, Baptiste

2015-12-01

Atmospheric retrieval is a growing, new approach in the theory of exoplanet atmosphere characterization. Unlike self-consistent modeling it allows us to fully explore the parameter space, as well as the degeneracies between the parameters using a Bayesian framework. We present HELIOS-R, a very fast retrieving code written in Python and optimized for GPU computation. Once it is ready, HELIOS-R will be the first open-source atmospheric retrieval code accessible to the exoplanet community. As the new generation of direct imaging instruments (SPHERE, GPI) have started to gather data, the first version of HELIOS-R focuses on emission spectra. We use a 1D two-stream forward model for computing fluxes and couple it to an analytical temperature-pressure profile that is constructed to be in radiative equilibrium. We use our ultra-fast opacity calculator HELIOS-K (also open-source) to compute the opacities of CO2, H2O, CO and CH4 from the HITEMP database. We test both opacity sampling (which is typically used by other workers) and the method of k-distributions. Using this setup, we compute a grid of synthetic spectra and temperature-pressure profiles, which is then explored using a nested sampling algorithm. By focusing on model selection (Occam’s razor) through the explicit computation of the Bayesian evidence, nested sampling allows us to deal with current sparse data as well as upcoming high-resolution observations. Once the best model is selected, HELIOS-R provides posterior distributions of the parameters. As a test for our code we studied HR8799 system and compared our results with the previous analysis of Lee, Heng & Irwin (2013), which used the proprietary NEMESIS retrieval code. HELIOS-R and HELIOS-K are part of the set of open-source community codes we named the Exoclimes Simulation Platform (www.exoclime.org).
Mobile Source Observation Database (MSOD)

EPA Pesticide Factsheets

The Mobile Source Observation Database (MSOD) is a relational database developed by the Assessment and Standards Division (ASD) of the U.S. EPA Office of Transportation and Air Quality (formerly the Office of Mobile Sources).
Pathogen metadata platform: software for accessing and analyzing pathogen strain information.

PubMed

Chang, Wenling E; Peterson, Matthew W; Garay, Christopher D; Korves, Tonia

2016-09-15

Pathogen metadata includes information about where and when a pathogen was collected and the type of environment it came from. Along with genomic nucleotide sequence data, this metadata is growing rapidly and becoming a valuable resource not only for research but for biosurveillance and public health. However, current freely available tools for analyzing this data are geared towards bioinformaticians and/or do not provide summaries and visualizations needed to readily interpret results. We designed a platform to easily access and summarize data about pathogen samples. The software includes a PostgreSQL database that captures metadata useful for disease outbreak investigations, and scripts for downloading and parsing data from NCBI BioSample and BioProject into the database. The software provides a user interface to query metadata and obtain standardized results in an exportable, tab-delimited format. To visually summarize results, the user interface provides a 2D histogram for user-selected metadata types and mapping of geolocated entries. The software is built on the LabKey data platform, an open-source data management platform, which enables developers to add functionalities. We demonstrate the use of the software in querying for a pathogen serovar and for genome sequence identifiers. This software enables users to create a local database for pathogen metadata, populate it with data from NCBI, easily query the data, and obtain visual summaries. Some of the components, such as the database, are modular and can be incorporated into other data platforms. The source code is freely available for download at https://github.com/wchangmitre/bioattribution .
New mutations and an updated database for the patched-1 (PTCH1) gene.

PubMed

Reinders, Marie G; van Hout, Antonius F; Cosgun, Betûl; Paulussen, Aimée D; Leter, Edward M; Steijlen, Peter M; Mosterd, Klara; van Geel, Michel; Gille, Johan J

2018-05-01

Basal cell nevus syndrome (BCNS) is an autosomal dominant disorder characterized by multiple basal cell carcinomas (BCCs), maxillary keratocysts, and cerebral calcifications. BCNS most commonly is caused by a germline mutation in the patched-1 (PTCH1) gene. PTCH1 mutations are also described in patients with holoprosencephaly. We have established a locus-specific database for the PTCH1 gene using the Leiden Open Variation Database (LOVD). We included 117 new PTCH1 variations, in addition to 331 previously published unique PTCH1 mutations. These new mutations were found in 141 patients who had a positive PTCH1 mutation analysis in either the VU University Medical Centre (VUMC) or Maastricht University Medical Centre (MUMC) between 1995 and 2015. The database contains 331 previously published unique PTCH1 mutations and 117 new PTCH1 variations. We have established a locus-specific database for the PTCH1 gene using the Leiden Open Variation Database (LOVD). The database provides an open collection for both clinicians and researchers and is accessible online at http://www.lovd.nl/PTCH1. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.
ASEAN Mineral Database and Information System (AMDIS)

NASA Astrophysics Data System (ADS)

Okubo, Y.; Ohno, T.; Bandibas, J. C.; Wakita, K.; Oki, Y.; Takahashi, Y.

2014-12-01

AMDIS has lunched officially since the Fourth ASEAN Ministerial Meeting on Minerals on 28 November 2013. In cooperation with Geological Survey of Japan, the web-based GIS was developed using Free and Open Source Software (FOSS) and the Open Geospatial Consortium (OGC) standards. The system is composed of the local databases and the centralized GIS. The local databases created and updated using the centralized GIS are accessible from the portal site. The system introduces distinct advantages over traditional GIS. Those are a global reach, a large number of users, better cross-platform capability, charge free for users, charge free for provider, easy to use, and unified updates. Raising transparency of mineral information to mining companies and to the public, AMDIS shows that mineral resources are abundant throughout the ASEAN region; however, there are many datum vacancies. We understand that such problems occur because of insufficient governance of mineral resources. Mineral governance we refer to is a concept that enforces and maximizes the capacity and systems of government institutions that manages minerals sector. The elements of mineral governance include a) strengthening of information infrastructure facility, b) technological and legal capacities of state-owned mining companies to fully-engage with mining sponsors, c) government-led management of mining projects by supporting the project implementation units, d) government capacity in mineral management such as the control and monitoring of mining operations, and e) facilitation of regional and local development plans and its implementation with the private sector.
Apollo: a community resource for genome annotation editing

PubMed Central

Ed, Lee; Nomi, Harris; Mark, Gibson; Raymond, Chetty; Suzanna, Lewis

2009-01-01

Summary: Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Availability: Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo. Contact: elee@berkeleybop.org PMID:19439563
Apollo: a community resource for genome annotation editing.

PubMed

Lee, Ed; Harris, Nomi; Gibson, Mark; Chetty, Raymond; Lewis, Suzanna

2009-07-15

Apollo is a genome annotation-editing tool with an easy to use graphical interface. It is a component of the GMOD project, with ongoing development driven by the community. Recent additions to the software include support for the generic feature format version 3 (GFF3), continuous transcriptome data, a full Chado database interface, integration with remote services for on-the-fly BLAST and Primer BLAST analyses, graphical interfaces for configuring user preferences and full undo of all edit operations. Apollo's user community continues to grow, including its use as an educational tool for college and high-school students. Apollo is a Java application distributed under a free and open source license. Installers for Windows, Linux, Unix, Solaris and Mac OS X are available at http://apollo.berkeleybop.org, and the source code is available from the SourceForge CVS repository at http://gmod.cvs.sourceforge.net/gmod/apollo.
Nitrates in drinking water: relation with intensive livestock production.

PubMed

Giammarino, M; Quatto, P

2015-01-01

An excess of nitrates causes environmental pollution in receiving water bodies and health risk for human, if contaminated water is source of drinking water. The directive 91/676/ CEE [1] aims to reduce the nitrogen pressure in Europe from agriculture sources and identifies the livestock population as one of the predominant sources of surplus of nutrients that could be released in water and air. Directive is concerned about cattle, sheep, pigs and poultry and their territorial loads, but it does not deal with fish farms. Fish farms effluents may contain pollutants affecting ecosystem water quality. On the basis of multivariate statistical analysis, this paper aims to establish what types of farming affect the presence of nitrates in drinking water in the province of Cuneo, Piedmont, Italy. In this regard, we have used data from official sources on nitrates in drinking water and data Arvet database, concerning the presence of intensive farming in the considered area. For model selection we have employed automatic variable selection algorithm. We have identified fish farms as a major source of nitrogen released into the environment, while pollution from sheep and poultry has appeared negligible. We would like to emphasize the need to include in the "Nitrate Vulnerable Zones" (as defined in Directive 91/676/CEE [1]), all areas where there are intensive farming of fish with open-system type of water use. Besides, aquaculture open-system should be equipped with adequate downstream system of filtering for removing nitrates in the wastewater.
PubMed Central

QUATTO, P.

2015-01-01

Summary Introduction. An excess of nitrates causes environmental pollution in receiving water bodies and health risk for human, if contaminated water is source of drinking water. The directive 91/676/ CEE [1] aims to reduce the nitrogen pressure in Europe from agriculture sources and identifies the livestock population as one of the predominant sources of surplus of nutrients that could be released in water and air. Directive is concerned about cattle, sheep, pigs and poultry and their territorial loads, but it does not deal with fish farms. Fish farms effluents may contain pollutants affecting ecosystem water quality. Methods. On the basis of multivariate statistical analysis, this paper aims to establish what types of farming affect the presence of nitrates in drinking water in the province of Cuneo, Piedmont, Italy. In this regard, we have used data from official sources on nitrates in drinking water and data Arvet database, concerning the presence of intensive farming in the considered area. For model selection we have employed automatic variable selection algorithm. Results and discussion. We have identified fish farms as a major source of nitrogen released into the environment, while pollution from sheep and poultry has appeared negligible. We would like to emphasize the need to include in the "Nitrate Vulnerable Zones" (as defined in Directive 91/676/CEE [1]), all areas where there are intensive farming of fish with open-system type of water use. Besides, aquaculture open-system should be equipped with adequate downstream system of filtering for removing nitrates in the wastewater. PMID:26900335
MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB

PubMed Central

Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A.

2013-01-01

Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/ PMID:24124417

MOBBED: a computational data infrastructure for handling large collections of event-rich time series datasets in MATLAB.

PubMed

Cockfield, Jeremy; Su, Kyungmin; Robbins, Kay A

2013-01-01

Experiments to monitor human brain activity during active behavior record a variety of modalities (e.g., EEG, eye tracking, motion capture, respiration monitoring) and capture a complex environmental context leading to large, event-rich time series datasets. The considerable variability of responses within and among subjects in more realistic behavioral scenarios requires experiments to assess many more subjects over longer periods of time. This explosion of data requires better computational infrastructure to more systematically explore and process these collections. MOBBED is a lightweight, easy-to-use, extensible toolkit that allows users to incorporate a computational database into their normal MATLAB workflow. Although capable of storing quite general types of annotated data, MOBBED is particularly oriented to multichannel time series such as EEG that have event streams overlaid with sensor data. MOBBED directly supports access to individual events, data frames, and time-stamped feature vectors, allowing users to ask questions such as what types of events or features co-occur under various experimental conditions. A database provides several advantages not available to users who process one dataset at a time from the local file system. In addition to archiving primary data in a central place to save space and avoid inconsistencies, such a database allows users to manage, search, and retrieve events across multiple datasets without reading the entire dataset. The database also provides infrastructure for handling more complex event patterns that include environmental and contextual conditions. The database can also be used as a cache for expensive intermediate results that are reused in such activities as cross-validation of machine learning algorithms. MOBBED is implemented over PostgreSQL, a widely used open source database, and is freely available under the GNU general public license at http://visual.cs.utsa.edu/mobbed. Source and issue reports for MOBBED are maintained at http://vislab.github.com/MobbedMatlab/
ATLAS (Automatic Tool for Local Assembly Structures) - A Comprehensive Infrastructure for Assembly, Annotation, and Genomic Binning of Metagenomic and Metaranscripomic Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard A.; Brown, Joseph M.; Colby, Sean M.

ATLAS (Automatic Tool for Local Assembly Structures) is a comprehensive multiomics data analysis pipeline that is massively parallel and scalable. ATLAS contains a modular analysis pipeline for assembly, annotation, quantification and genome binning of metagenomics and metatranscriptomics data and a framework for reference metaproteomic database construction. ATLAS transforms raw sequence data into functional and taxonomic data at the microbial population level and provides genome-centric resolution through genome binning. ATLAS provides robust taxonomy based on majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS provides robust taxonomy based onmore » majority voting of protein coding open reading frames rolled-up at the contig level using modified lowest common ancestor (LCA) analysis. ATLAS is user-friendly, easy install through bioconda maintained as open-source on GitHub, and is implemented in Snakemake for modular customizable workflows.« less
Online Databases in Physics.

ERIC Educational Resources Information Center

Sievert, MaryEllen C.; Verbeck, Alison F.

1984-01-01

This overview of 47 online sources for physics information available in the United States--including sub-field databases, transdisciplinary databases, and multidisciplinary databases-- notes content, print source, language, time coverage, and databank. Two discipline-specific databases (SPIN and PHYSICS BRIEFS) are also discussed. (EJS)
STARS 2.0: 2nd-generation open-source archiving and query software

NASA Astrophysics Data System (ADS)

Winegar, Tom

2008-07-01

The Subaru Telescope is in process of developing an open-source alternative to the 1st-generation software and databases (STARS 1) used for archiving and query. For STARS 2, we have chosen PHP and Python for scripting and MySQL as the database software. We have collected feedback from staff and observers, and used this feedback to significantly improve the design and functionality of our future archiving and query software. Archiving - We identified two weaknesses in 1st-generation STARS archiving software: a complex and inflexible table structure and uncoordinated system administration for our business model: taking pictures from the summit and archiving them in both Hawaii and Japan. We adopted a simplified and normalized table structure with passive keyword collection, and we are designing an archive-to-archive file transfer system that automatically reports real-time status and error conditions and permits error recovery. Query - We identified several weaknesses in 1st-generation STARS query software: inflexible query tools, poor sharing of calibration data, and no automatic file transfer mechanisms to observers. We are developing improved query tools and sharing of calibration data, and multi-protocol unassisted file transfer mechanisms for observers. In the process, we have redefined a 'query': from an invisible search result that can only transfer once in-house right now, with little status and error reporting and no error recovery - to a stored search result that can be monitored, transferred to different locations with multiple protocols, reporting status and error conditions and permitting recovery from errors.
Microbe-ID: an open source toolbox for microbial genotyping and species identification.

PubMed

Tabima, Javier F; Everhart, Sydney E; Larsen, Meredith M; Weisberg, Alexandra J; Kamvar, Zhian N; Tancos, Matthew A; Smart, Christine D; Chang, Jeff H; Grünwald, Niklaus J

2016-01-01

Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID.
An Android based location service using GSMCellID and GPS to obtain a graphical guide to the nearest cash machine

NASA Astrophysics Data System (ADS)

Jacobsen, Jurma; Edlich, Stefan

2009-02-01

There is a broad range of potential useful mobile location-based applications. One crucial point seems to be to make them available to the public at large. This case illuminates the abilities of Android - the operating system for mobile devices - to fulfill this demand in the mashup way by use of some special geocoding web services and one integrated web service for getting the nearest cash machines data. It shows an exemplary approach for building mobile location-based mashups for everyone: 1. As a basis for reaching as many people as possible the open source Android OS is assumed to spread widely. 2. Everyone also means that the handset has not to be an expensive GPS device. This is realized by re-utilization of the existing GSM infrastructure with the Cell of Origin (COO) method which makes a lookup of the CellID in one of the growing web available CellID databases. Some of these databases are still undocumented and not yet published. Furthermore the Google Maps API for Mobile (GMM) and the open source counterpart OpenCellID are used. The user's current position localization via lookup of the closest cell to which the handset is currently connected to (COO) is not as precise as GPS, but appears to be sufficient for lots of applications. For this reason the GPS user is the most pleased one - for this user the system is fully automated. In contrary there could be some users who doesn't own a GPS cellular. This user should refine his/her location by one click on the map inside of the determined circular region. The users are then shown and guided by a path to the nearest cash machine by integrating Google Maps API with an overlay. Additionally, the GPS user can keep track of him- or herself by getting a frequently updated view via constantly requested precise GPS data for his or her position.
Using the STOQS Web Application for Access to in situ Oceanographic Data

NASA Astrophysics Data System (ADS)

McCann, M. P.

2012-12-01

Using the STOQS Web Application for Access to in situ Oceanographic Data Mike McCann 7 August 2012 With increasing measurement and sampling capabilities of autonomous oceanographic platforms (e.g. Gliders, Autonomous Underwater Vehicles, Wavegliders), the need to efficiently access and visualize the data they collect is growing. The Monterey Bay Aquarium Research Institute has designed and built the Spatial Temporal Oceanographic Query System (STOQS) specifically to address this issue. The need for STOQS arises from inefficiencies discovered from using CF-NetCDF point observation conventions for these data. The problem is that access efficiency decreases with decreasing dimension of CF-NetCDF data. For example, the Trajectory Common Data Model feature type has only one coordinate dimension, usually Time - positions of the trajectory (Depth, Latitude, Longitude) are stored as non-indexed record variables within the NetCDF file. If client software needs to access data between two depth values or from a bounded geographic area, then the whole data set must be read and the selection made within the client software. This is very inefficient. What is needed is a way to easily select data of interest from an archive given any number of spatial, temporal, or other constraints. Geospatial relational database technology provides this capability. The full STOQS application consists of a Postgres/PostGIS database, Mapserver, and Python-Django running on a server and Web 2.0 technology (jQuery, OpenLayers, Twitter Bootstrap) running in a modern web browser. The web application provides faceted search capabilities allowing a user to quickly drill into the data of interest. Data selection can be constrained by spatial, temporal, and depth selections as well as by parameter value and platform name. The web application layer also provides a REST (Representational State Transfer) Application Programming Interface allowing tools such as the Matlab stoqstoolbox to retrieve data directly from the database. STOQS is an open source software project built upon a framework of free and open source software and is available for anyone to use for making their data more accessible and usable. For more information please see: http://code.google.com/p/stoqs/.; In the above screen grab a user has selected the "mass_concentrtion_of_chlorophyll_in_sea_water" parameter and a time depth range that includes three weeks of AUV missions of just the upper 5 meters.
RefPrimeCouch—a reference gene primer CouchApp

PubMed Central

Silbermann, Jascha; Wernicke, Catrin; Pospisil, Heike; Frohme, Marcus

2013-01-01

To support a quantitative real-time polymerase chain reaction standardization project, a new reference gene database application was required. The new database application was built with the explicit goal of simplifying not only the development process but also making the user interface more responsive and intuitive. To this end, CouchDB was used as the backend with a lightweight dynamic user interface implemented client-side as a one-page web application. Data entry and curation processes were streamlined using an OpenRefine-based workflow. The new RefPrimeCouch database application provides its data online under an Open Database License. Database URL: http://hpclife.th-wildau.de:5984/rpc/_design/rpc/view.html PMID:24368831
RefPrimeCouch--a reference gene primer CouchApp.

PubMed

Silbermann, Jascha; Wernicke, Catrin; Pospisil, Heike; Frohme, Marcus

2013-01-01

To support a quantitative real-time polymerase chain reaction standardization project, a new reference gene database application was required. The new database application was built with the explicit goal of simplifying not only the development process but also making the user interface more responsive and intuitive. To this end, CouchDB was used as the backend with a lightweight dynamic user interface implemented client-side as a one-page web application. Data entry and curation processes were streamlined using an OpenRefine-based workflow. The new RefPrimeCouch database application provides its data online under an Open Database License. Database URL: http://hpclife.th-wildau.de:5984/rpc/_design/rpc/view.html.
Archetype relational mapping - a practical openEHR persistence solution.

PubMed

Wang, Li; Min, Lingtong; Wang, Rui; Lu, Xudong; Duan, Huilong

2015-11-05

One of the primary obstacles to the widespread adoption of openEHR methodology is the lack of practical persistence solutions for future-proof electronic health record (EHR) systems as described by the openEHR specifications. This paper presents an archetype relational mapping (ARM) persistence solution for the archetype-based EHR systems to support healthcare delivery in the clinical environment. First, the data requirements of the EHR systems are analysed and organized into archetype-friendly concepts. The Clinical Knowledge Manager (CKM) is queried for matching archetypes; when necessary, new archetypes are developed to reflect concepts that are not encompassed by existing archetypes. Next, a template is designed for each archetype to apply constraints related to the local EHR context. Finally, a set of rules is designed to map the archetypes to data tables and provide data persistence based on the relational database. A comparison study was conducted to investigate the differences among the conventional database of an EHR system from a tertiary Class A hospital in China, the generated ARM database, and the Node + Path database. Five data-retrieving tests were designed based on clinical workflow to retrieve exams and laboratory tests. Additionally, two patient-searching tests were designed to identify patients who satisfy certain criteria. The ARM database achieved better performance than the conventional database in three of the five data-retrieving tests, but was less efficient in the remaining two tests. The time difference of query executions conducted by the ARM database and the conventional database is less than 130 %. The ARM database was approximately 6-50 times more efficient than the conventional database in the patient-searching tests, while the Node + Path database requires far more time than the other two databases to execute both the data-retrieving and the patient-searching tests. The ARM approach is capable of generating relational databases using archetypes and templates for archetype-based EHR systems, thus successfully adapting to changes in data requirements. ARM performance is similar to that of conventionally-designed EHR systems, and can be applied in a practical clinical environment. System components such as ARM can greatly facilitate the adoption of openEHR architecture within EHR systems.
A new Volcanic managEment Risk Database desIgn (VERDI): Application to El Hierro Island (Canary Islands)

NASA Astrophysics Data System (ADS)

Bartolini, S.; Becerril, L.; Martí, J.

2014-11-01

One of the most important issues in modern volcanology is the assessment of volcanic risk, which will depend - among other factors - on both the quantity and quality of the available data and an optimum storage mechanism. This will require the design of purpose-built databases that take into account data format and availability and afford easy data storage and sharing, and will provide for a more complete risk assessment that combines different analyses but avoids any duplication of information. Data contained in any such database should facilitate spatial and temporal analysis that will (1) produce probabilistic hazard models for future vent opening, (2) simulate volcanic hazards and (3) assess their socio-economic impact. We describe the design of a new spatial database structure, VERDI (Volcanic managEment Risk Database desIgn), which allows different types of data, including geological, volcanological, meteorological, monitoring and socio-economic information, to be manipulated, organized and managed. The root of the question is to ensure that VERDI will serve as a tool for connecting different kinds of data sources, GIS platforms and modeling applications. We present an overview of the database design, its components and the attributes that play an important role in the database model. The potential of the VERDI structure and the possibilities it offers in regard to data organization are here shown through its application on El Hierro (Canary Islands). The VERDI database will provide scientists and decision makers with a useful tool that will assist to conduct volcanic risk assessment and management.
Modular Chemical Descriptor Language (MCDL): Stereochemical modules

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gakh, Andrei A; Burnett, Michael N; Trepalin, Sergei V.

2011-01-01

In our previous papers we introduced the Modular Chemical Descriptor Language (MCDL) for providing a linear representation of chemical information. A subsequent development was the MCDL Java Chemical Structure Editor which is capable of drawing chemical structures from linear representations and generating MCDL descriptors from structures. In this paper we present MCDL modules and accompanying software that incorporate unique representation of molecular stereochemistry based on Cahn-Ingold-Prelog and Fischer ideas in constructing stereoisomer descriptors. The paper also contains additional discussions regarding canonical representation of stereochemical isomers, and brief algorithm descriptions of the open source LINDES, Java applet, and Open Babel MCDLmore » processing module software packages. Testing of the upgraded MCDL Java Chemical Structure Editor on compounds taken from several large and diverse chemical databases demonstrated satisfactory performance for storage and processing of stereochemical information in MCDL format.« less
Trends in Solar energy Driven Vertical Ground Source Heat Pump Systems in Sweden - An Analysis Based on the Swedish Well Database

NASA Astrophysics Data System (ADS)

Juhlin, K.; Gehlin, S.

2016-12-01

Sweden is a world leader in developing and using vertical ground source heat pump (GSHP) technology. GSHP systems extract passively stored solar energy in the ground and the Earth's natural geothermal energy. Geothermal energy is an admitted renewable energy source in Sweden since 2007 and is the third largest renewable energy source in the country today. The Geological Survey of Sweden (SGU) is the authority in Sweden that provides open access geological data of rock, soil and groundwater for the public. All wells drilled must be registered in the SGU Well Database and it is the well driller's duty to submit registration of drilled wells.Both active and passive geothermal energy systems are in use. Large GSHP systems, with at least 20 boreholes, are active geothermal energy systems. Energy is stored in the ground which allows both comfort heating and cooling to be extracted. Active systems are therefore relevant for larger properties and industrial buildings. Since 1978 more than 600 000 wells (water wells, GSHP boreholes etc) have been registered in the Well Database, with around 20 000 new registrations per year. Of these wells an estimated 320 000 wells are registered as GSHP boreholes. The vast majority of these boreholes are single boreholes for single-family houses. The number of properties with registered vertical borehole GSHP installations amounts to approximately 243 000. Of these sites between 300-350 are large GSHP systems with at least 20 boreholes. While the increase in number of new registrations for smaller homes and households has slowed down after the rapid development in the 80's and 90's, the larger installations for commercial and industrial buildings have increased in numbers over the last ten years. This poster uses data from the SGU Well Database to quantify and analyze the trends in vertical GSHP systems reported between 1978-2015 in Sweden, with special focus on large systems. From the new aggregated data, conclusions can be drawn about the development of larger vertical GSHP system installments over the years and the geographical distribution in Sweden.
49 CFR 1104.3 - Copies.

Code of Federal Regulations, 2013 CFR

2013-10-01

... fully evaluate evidence, all spreadsheets must be fully accessible and manipulable. Electronic databases... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...
49 CFR 1104.3 - Copies.

Code of Federal Regulations, 2014 CFR

2014-10-01

... fully evaluate evidence, all spreadsheets must be fully accessible and manipulable. Electronic databases... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...
49 CFR 1104.3 - Copies.

Code of Federal Regulations, 2012 CFR

2012-10-01

... fully evaluate evidence, all spreadsheets must be fully accessible and manipulable. Electronic databases... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...
49 CFR 1104.3 - Copies.

Code of Federal Regulations, 2010 CFR

2010-10-01

... fully evaluate evidence, all spreadsheets must be fully accessible and manipulable. Electronic databases... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...
Data Sharing in Astrobiology: the Astrobiology Habitable Environments Database (AHED)

NASA Astrophysics Data System (ADS)

Bristow, T.; Lafuente Valverde, B.; Keller, R.; Stone, N.; Downs, R. T.; Blake, D. F.; Fonda, M.; Pires, A.

2016-12-01

Astrobiology is a multidisciplinary area of scientific research focused on studying the origins of life on Earth and the conditions under which life might have emerged elsewhere in the universe. The understanding of complex questions in astrobiology requires integration and analysis of data spanning a range of disciplines including biology, chemistry, geology, astronomy and planetary science. However, the lack of a centralized repository makes it difficult for astrobiology teams to share data and benefit from resultant synergies. Moreover, in recent years, federal agencies are requiring that results of any federally funded scientific research must be available and useful for the public and the science community. Astrobiology, as any other scientific discipline, needs to respond to these mandates. The Astrobiology Habitable Environments Database (AHED) is a central, high quality, long-term searchable repository designed to help the community by promoting the integration and sharing of all the data generated by these diverse disciplines. AHED provides public and open-access to astrobiology-related research data through a user-managed web portal implemented using the open-source software The Open Data Repository's (ODR) Data Publisher [1]. ODR-DP provides a user-friendly interface that research teams or individual scientists can use to design, populate and manage their own databases or laboratory notebooks according to the characteristics of their data. AHED is then a collection of databases housed in the ODR framework that store information about samples, along with associated measurements, analyses, and contextual information about field sites where samples were collected, the instruments or equipment used for analysis, and people and institutions involved in their collection. Advanced graphics are implemented together with advanced online tools for data analysis (e.g. R, MATLAB, Project Jupyter-http://jupyter.org). A permissions system will be put in place so that as data are being actively collected and interpreted, they will remain proprietary. A citation system will allow research data to be used and appropriately referenced by other researchers after the data are made public. This project is supported by SERA and NASA NNX11AP82A, MSL. [1] Stone et al. (2016) AGU, submitted.
Linked Patient-Reported Outcomes Data From Patients With Multiple Sclerosis Recruited on an Open Internet Platform to Health Care Claims Databases Identifies a Representative Population for Real-Life Data Analysis in Multiple Sclerosis.

PubMed

Risson, Valery; Ghodge, Bhaskar; Bonzani, Ian C; Korn, Jonathan R; Medin, Jennie; Saraykar, Tanmay; Sengupta, Souvik; Saini, Deepanshu; Olson, Melvin

2016-09-22

An enormous amount of information relevant to public health is being generated directly by online communities. To explore the feasibility of creating a dataset that links patient-reported outcomes data, from a Web-based survey of US patients with multiple sclerosis (MS) recruited on open Internet platforms, to health care utilization information from health care claims databases. The dataset was generated by linkage analysis to a broader MS population in the United States using both pharmacy and medical claims data sources. US Facebook users with an interest in MS were alerted to a patient-reported survey by targeted advertisements. Eligibility criteria were diagnosis of MS by a specialist (primary progressive, relapsing-remitting, or secondary progressive), ≥12-month history of disease, age 18-65 years, and commercial health insurance. Participants completed a questionnaire including data on demographic and disease characteristics, current and earlier therapies, relapses, disability, health-related quality of life, and employment status and productivity. A unique anonymous profile was generated for each survey respondent. Each anonymous profile was linked to a number of medical and pharmacy claims datasets in the United States. Linkage rates were assessed and survey respondents' representativeness was evaluated based on differences in the distribution of characteristics between the linked survey population and the general MS population in the claims databases. The advertisement was placed on 1,063,973 Facebook users' pages generating 68,674 clicks, 3719 survey attempts, and 651 successfully completed surveys, of which 440 could be linked to any of the claims databases for 2014 or 2015 (67.6% linkage rate). Overall, no significant differences were found between patients who were linked and not linked for educational status, ethnicity, current or prior disease-modifying therapy (DMT) treatment, or presence of a relapse in the last 12 months. The frequencies of the most common MS symptoms did not differ significantly between linked patients and the general MS population in the databases. Linked patients were slightly younger and less likely to be men than those who were not linkable. Linking patient-reported outcomes data, from a Web-based survey of US patients with MS recruited on open Internet platforms, to health care utilization information from claims databases may enable rapid generation of a large population of representative patients with MS suitable for outcomes analysis.
Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database.

PubMed

Allard, Marc W; Strain, Errol; Melka, David; Bunning, Kelly; Musser, Steven M; Brown, Eric W; Timme, Ruth

2016-08-01

The FDA has created a United States-based open-source whole-genome sequencing network of state, federal, international, and commercial partners. The GenomeTrakr network represents a first-of-its-kind distributed genomic food shield for characterizing and tracing foodborne outbreak pathogens back to their sources. The GenomeTrakr network is leading investigations of outbreaks of foodborne illnesses and compliance actions with more accurate and rapid recalls of contaminated foods as well as more effective monitoring of preventive controls for food manufacturing environments. An expanded network would serve to provide an international rapid surveillance system for pathogen traceback, which is critical to support an effective public health response to bacterial outbreaks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

Epidemiological considerations for the use of databases in transfusion research: a Scandinavian perspective.

PubMed

Edgren, Gustaf; Hjalgrim, Henrik

2010-11-01

At current safety levels, with adverse events from transfusions being relatively rare, further progress in risk reductions will require large-scale investigations. Thus, truly prospective studies may prove unfeasible and other alternatives deserve consideration. In this review, we will try to give an overview of recent and historical developments in the use of blood donation and transfusion databases in research. In addition, we will go over important methodological issues. There are at least three nationwide or near-nationwide donation/transfusion databases with the possibility for long-term follow-up of donors and recipients. During the past few years, a large number of reports have been published utilizing such data sources to investigate transfusion-associated risks. In addition, numerous clinics systematically collect and use such data on a smaller scale. Combining systematically recorded donation and transfusion data with long-term health follow-up opens up exciting opportunities for transfusion medicine research. However, the correct analysis of such data requires close attention to methodological issues, especially including the indication for transfusion and reverse causality.
Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME

PubMed Central

Berthold, Michael R.; Hedrick, Michael P.; Gilson, Michael K.

2015-01-01

Today’s large, public databases of protein–small molecule interaction data are creating important new opportunities for data mining and integration. At the same time, new graphical user interface-based workflow tools offer facile alternatives to custom scripting for informatics and data analysis. Here, we illustrate how the large protein-ligand database BindingDB may be incorporated into KNIME workflows as a step toward the integration of pharmacological data with broader biomolecular analyses. Thus, we describe a collection of KNIME workflows that access BindingDB data via RESTful webservices and, for more intensive queries, via a local distillation of the full BindingDB dataset. We focus in particular on the KNIME implementation of knowledge-based tools to generate informed hypotheses regarding protein targets of bioactive compounds, based on notions of chemical similarity. A number of variants of this basic approach are tested for seven existing drugs with relatively ill-defined therapeutic targets, leading to replication of some previously confirmed results and discovery of new, high-quality hits. Implications for future development are discussed. Database URL: www.bindingdb.org PMID:26384374
ERAIZDA: a model for holistic annotation of animal infectious and zoonotic diseases.

PubMed

Buza, Teresia M; Jack, Sherman W; Kirunda, Halid; Khaitsa, Margaret L; Lawrence, Mark L; Pruett, Stephen; Peterson, Daniel G

2015-01-01

There is an urgent need for a unified resource that integrates trans-disciplinary annotations of emerging and reemerging animal infectious and zoonotic diseases. Such data integration will provide wonderful opportunity for epidemiologists, researchers and health policy makers to make data-driven decisions designed to improve animal health. Integrating emerging and reemerging animal infectious and zoonotic disease data from a large variety of sources into a unified open-access resource provides more plausible arguments to achieve better understanding of infectious and zoonotic diseases. We have developed a model for interlinking annotations of these diseases. These diseases are of particular interest because of the threats they pose to animal health, human health and global health security. We demonstrated the application of this model using brucellosis, an infectious and zoonotic disease. Preliminary annotations were deposited into VetBioBase database (http://vetbiobase.igbb.msstate.edu). This database is associated with user-friendly tools to facilitate searching, retrieving and downloading of disease-related information. Database URL: http://vetbiobase.igbb.msstate.edu. © The Author(s) 2015. Published by Oxford University Press.
Investigating the Potential Impacts of Energy Production in the Marcellus Shale Region Using the Shale Network Database

NASA Astrophysics Data System (ADS)

Brantley, S.; Pollak, J.

2016-12-01

The Shale Network's extensive database of water quality observations in the Marcellus Shale region enables educational experiences about the potential impacts of resource extraction and energy production with real data. Through tools that are open source and free to use, interested parties can access and analyze the very same data that the Shale Network team has used in peer-reviewed publications about the potential impacts of hydraulic fracturing on water. The development of the Shale Network database has been made possible through efforts led by an academic team and involving numerous individuals from government agencies, citizen science organizations, and private industry. With these tools and data, the Shale Network team has engaged high school students, university undergraduate and graduate students, as well as citizens so that all can discover how energy production impacts the Marcellus Shale region, which includes Pennsylvania and other nearby states. This presentation will describe these data tools, how the Shale Network has used them in educational settings, and the resources available to learn more.
Investigating the Potential Impacts of Energy Production in the Marcellus Shale Region Using the Shale Network Database and CUAHSI-Supported Data Tools

NASA Astrophysics Data System (ADS)

Brazil, L.

2017-12-01

The Shale Network's extensive database of water quality observations enables educational experiences about the potential impacts of resource extraction with real data. Through open source tools that are developed and maintained by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI), researchers, educators, and citizens can access and analyze the very same data that the Shale Network team has used in peer-reviewed publications about the potential impacts of hydraulic fracturing on water. The development of the Shale Network database has been made possible through collection efforts led by an academic team and involving numerous individuals from government agencies, citizen science organizations, and private industry. Thus far, CUAHSI-supported data tools have been used to engage high school students, university undergraduate and graduate students, as well as citizens so that all can discover how energy production impacts the Marcellus Shale region, which includes Pennsylvania and other nearby states. This presentation will describe these data tools, how the Shale Network has used them in developing educational material, and the resources available to learn more.
FilTer BaSe: A web accessible chemical database for small compound libraries.

PubMed

Kolte, Baban S; Londhe, Sanjay R; Solanki, Bhushan R; Gacche, Rajesh N; Meshram, Rohan J

2018-03-01

Finding novel chemical agents for targeting disease associated drug targets often requires screening of large number of new chemical libraries. In silico methods are generally implemented at initial stages for virtual screening. Filtering of such compound libraries on physicochemical and substructure ground is done to ensure elimination of compounds with undesired chemical properties. Filtering procedure, is redundant, time consuming and requires efficient bioinformatics/computer manpower along with high end software involving huge capital investment that forms a major obstacle in drug discovery projects in academic setup. We present an open source resource, FilTer BaSe- a chemoinformatics platform (http://bioinfo.net.in/filterbase/) that host fully filtered, ready to use compound libraries with workable size. The resource also hosts a database that enables efficient searching the chemical space of around 348,000 compounds on the basis of physicochemical and substructure properties. Ready to use compound libraries and database presented here is expected to aid a helping hand for new drug developers and medicinal chemists. Copyright © 2017 Elsevier Inc. All rights reserved.
A framework for integration of scientific applications into the OpenTopography workflow

NASA Astrophysics Data System (ADS)

Nandigam, V.; Crosby, C.; Baru, C.

2012-12-01

The NSF-funded OpenTopography facility provides online access to Earth science-oriented high-resolution LIDAR topography data, online processing tools, and derivative products. The underlying cyberinfrastructure employs a multi-tier service oriented architecture that is comprised of an infrastructure tier, a processing services tier, and an application tier. The infrastructure tier consists of storage, compute resources as well as supporting databases. The services tier consists of the set of processing routines each deployed as a Web service. The applications tier provides client interfaces to the system. (e.g. Portal). We propose a "pluggable" infrastructure design that will allow new scientific algorithms and processing routines developed and maintained by the community to be integrated into the OpenTopography system so that the wider earth science community can benefit from its availability. All core components in OpenTopography are available as Web services using a customized open-source Opal toolkit. The Opal toolkit provides mechanisms to manage and track job submissions, with the help of a back-end database. It allows monitoring of job and system status by providing charting tools. All core components in OpenTopography have been developed, maintained and wrapped as Web services using Opal by OpenTopography developers. However, as the scientific community develops new processing and analysis approaches this integration approach is not scalable efficiently. Most of the new scientific applications will have their own active development teams performing regular updates, maintenance and other improvements. It would be optimal to have the application co-located where its developers can continue to actively work on it while still making it accessible within the OpenTopography workflow for processing capabilities. We will utilize a software framework for remote integration of these scientific applications into the OpenTopography system. This will be accomplished by virtually extending the OpenTopography service over the various infrastructures running these scientific applications and processing routines. This involves packaging and distributing a customized instance of the Opal toolkit that will wrap the software application as an OPAL-based web service and integrate it into the OpenTopography framework. We plan to make this as automated as possible. A structured specification of service inputs and outputs along with metadata annotations encoded in XML can be utilized to automate the generation of user interfaces, with appropriate tools tips and user help features, and generation of other internal software. The OpenTopography Opal toolkit will also include the customizations that will enable security authentication, authorization and the ability to write application usage and job statistics back to the OpenTopography databases. This usage information could then be reported to the original service providers and used for auditing and performance improvements. This pluggable framework will enable the application developers to continue to work on enhancing their application while making the latest iteration available in a timely manner to the earth sciences community. This will also help us establish an overall framework that other scientific application providers will also be able to use going forward.
A comparative concept analysis of centring vs. opening meditation processes in health care.

PubMed

Birx, Ellen

2013-08-01

To report an analysis and comparison of the concepts centring and opening meditation processes in health care. Centring and opening meditation processes are included in nursing theories and frequently recommended in health care for stress management. These meditation processes are integrated into emerging psychotherapy approaches and there is a rapidly expanding body of neuroscience research distinguishing brain activity associated with different types of meditation. Currently, there is a lack of theoretical and conceptual clarity needed to guide meditation research in health care. A search of healthcare literature between 2006-2011 was conducted using Alt HealthWatch, CINAHL, PsychNET and PubMed databases using the keywords 'centring' and 'opening' alone and in combination with the term 'meditation.' For the concept centring, 10 articles and 11 books and for the concept opening 13 articles and 10 books were included as data sources. Rodgers' evolutionary method of concept analysis was used. Centring and opening are similar in that they both involve awareness in the present moment; both use a gentle, effortless approach; and both have a calming effect. Key differences include centring's focus on the individual's inner experience compared with the non-dual, spacious awareness of opening. Centring and opening are overlapping, yet distinct meditation processes. The term meditation cannot be used in a generic way in health care. The differences between centring and opening have important implications for the further development of unitary-transformative nursing theories. © 2012 Blackwell Publishing Ltd.
JPRS Report, China.

DTIC Science & Technology

1991-06-24

52 Gross Industrial Output in April [CEI Database ...Jan-Apr Statistics on Payments to Employees [CEI Database ] ....................................................... 56 Jan-Apr Statistics on Labor...Productivity [CEI Database ] .............................................................. 56 TRANSPORTATION Hebei Province Opens Two Air Routes [HEBEI
Faculty Views of Open Web Resource Use by College Students

ERIC Educational Resources Information Center

Tomaiuolo, Nicholas G.

2005-01-01

This article assesses both the extent of students' use of open Web resources and library subscription databases and professors' satisfaction with that use as reported by a survey of 120 community college and university English faculty. It was concluded that although library budgets allocate significant funds to offer subscription databases,…
The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.

PubMed

Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A

2013-01-01

The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.
Integrated Array/Metadata Analytics

NASA Astrophysics Data System (ADS)

Misev, Dimitar; Baumann, Peter

2015-04-01

Data comes in various forms and types, and integration usually presents a problem that is often simply ignored and solved with ad-hoc solutions. Multidimensional arrays are an ubiquitous data type, that we find at the core of virtually all science and engineering domains, as sensor, model, image, statistics data. Naturally, arrays are richly described by and intertwined with additional metadata (alphanumeric relational data, XML, JSON, etc). Database systems, however, a fundamental building block of what we call "Big Data", lack adequate support for modelling and expressing these array data/metadata relationships. Array analytics is hence quite primitive or non-existent at all in modern relational DBMS. Recognizing this, we extended SQL with a new SQL/MDA part seamlessly integrating multidimensional array analytics into the standard database query language. We demonstrate the benefits of SQL/MDA with real-world examples executed in ASQLDB, an open-source mediator system based on HSQLDB and rasdaman, that already implements SQL/MDA.
The Saccharomyces Genome Database Variant Viewer.

PubMed

Sheppard, Travis K; Hitz, Benjamin C; Engel, Stacia R; Song, Giltae; Balakrishnan, Rama; Binkley, Gail; Costanzo, Maria C; Dalusag, Kyla S; Demeter, Janos; Hellerstedt, Sage T; Karra, Kalpana; Nash, Robert S; Paskov, Kelley M; Skrzypek, Marek S; Weng, Shuai; Wong, Edith D; Cherry, J Michael

2016-01-04

The Saccharomyces Genome Database (SGD; http://www.yeastgenome.org) is the authoritative community resource for the Saccharomyces cerevisiae reference genome sequence and its annotation. In recent years, we have moved toward increased representation of sequence variation and allelic differences within S. cerevisiae. The publication of numerous additional genomes has motivated the creation of new tools for their annotation and analysis. Here we present the Variant Viewer: a dynamic open-source web application for the visualization of genomic and proteomic differences. Multiple sequence alignments have been constructed across high quality genome sequences from 11 different S. cerevisiae strains and stored in the SGD. The alignments and summaries are encoded in JSON and used to create a two-tiered dynamic view of the budding yeast pan-genome, available at http://www.yeastgenome.org/variant-viewer. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
STINGRAY: system for integrated genomic resources and analysis.

PubMed

Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R

2014-03-07

The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.
EcoliWiki: a wiki-based community resource for Escherichia coli

PubMed Central

McIntosh, Brenley K.; Renfro, Daniel P.; Knapp, Gwendowlyn S.; Lairikyengbam, Chanchala R.; Liles, Nathan M.; Niu, Lili; Supak, Amanda M.; Venkatraman, Anand; Zweifel, Adrienne E.; Siegele, Deborah A.; Hu, James C.

2012-01-01

EcoliWiki is the community annotation component of the PortEco (http://porteco.org; formerly EcoliHub) project, an online data resource that integrates information on laboratory strains of Escherichia coli, its phages, plasmids and mobile genetic elements. As one of the early adopters of the wiki approach to model organism databases, EcoliWiki was designed to not only facilitate community-driven sharing of biological knowledge about E. coli as a model organism, but also to be interoperable with other data resources. EcoliWiki content currently covers genes from five laboratory E. coli strains, 21 bacteriophage genomes, F plasmid and eight transposons. EcoliWiki integrates the Mediawiki wiki platform with other open-source software tools and in-house software development to extend how wikis can be used for model organism databases. EcoliWiki can be accessed online at http://ecoliwiki.net. PMID:22064863
Chemical and isotopic database of water and gas from hydrothermal systems with an emphasis for the western United States

USGS Publications Warehouse

Mariner, R.H.; Venezky, D.Y.; Hurwitz, S.

2006-01-01

Chemical and isotope data accumulated by two USGS Projects (led by I. Barnes and R. Mariner) over a time period of about 40 years can now be found using a basic web search or through an image search (left). The data are primarily chemical and isotopic analyses of waters (thermal, mineral, or fresh) and associated gas (free and/or dissolved) collected from hot springs, mineral springs, cold springs, geothermal wells, fumaroles, and gas seeps. Additional information is available about the collection methods and analysis procedures.The chemical and isotope data are stored in a MySQL database and accessed using PHP from a basic search form below. Data can also be accessed using an Open Source GIS called WorldKit by clicking on the image to the left. Additional information is available about WorldKit including the files used to set up the site.
STINGRAY: system for integrated genomic resources and analysis

PubMed Central

2014-01-01

Background The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. Findings STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. Conclusion STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/. PMID:24606808
Fracture Systems - Digital Field Data Capture

NASA Astrophysics Data System (ADS)

Haslam, Richard

2017-04-01

Fracture systems play a key role in subsurface resources and developments including groundwater and nuclear waste repositories. There is increasing recognition that there is a need to record and quantify fracture systems to better understand the potential risks and opportunities. With the advent of smart phones and digital field geology there have been numerous systems designed for field data collection. Digital field data collection allows for rapid data collection and interpretations. However, many of the current systems have principally been designed to cover the full range of field mapping and data needs, making them large and complex, plus many do not offer the tools necessary for the collection of fracture specific data. A new multiplatform data recording app has been developed for the collection of field data on faults and joint/fracture systems and a relational database designed for storage and retrieval. The app has been developed to collect fault data and joint/fracture data based on an open source platform. Data is captured in a form-based approach including validity checks to ensure data is collected systematically. In addition to typical structural data collection, the International Society of Rock Mechanics' (ISRM) "Suggested Methods for the Quantitative Description of Discontinuities in Rock Masses" is included allowing for industry standards to be followed and opening up the tools to industry as well as research. All data is uploaded automatically to a secure server and users can view their data and open access data as required. Users can decide if the data they produce should remain private or be open access. A series of automatic reports can be produced and/or the data downloaded. The database will hold a national archive and data retrieval will be made through a web interface.
40 CFR Table 3 to Subpart Wwww of... - Organic HAP Emissions Limits for Existing Open Molding Sources, New Open Molding Sources Emitting...

Code of Federal Regulations, 2010 CFR

2010-07-01

... Existing Open Molding Sources, New Open Molding Sources Emitting Less Than 100 TPY of HAP, and New and... CATEGORIES National Emissions Standards for Hazardous Air Pollutants: Reinforced Plastic Composites... Existing Open Molding Sources, New Open Molding Sources Emitting Less Than 100 TPY of HAP, and New and...
Improving Software Sustainability: Lessons Learned from Profiles in Science.

PubMed

Gallagher, Marie E

2013-01-01

The Profiles in Science® digital library features digitized surrogates of historical items selected from the archival collections of the U.S. National Library of Medicine as well as collaborating institutions. In addition, it contains a database of descriptive, technical and administrative metadata. It also contains various software components that allow creation of the metadata, management of the digital items, and access to the items and metadata through the Profiles in Science Web site [1]. The choices made building the digital library were designed to maximize the sustainability and long-term survival of all of the components of the digital library [2]. For example, selecting standard and open digital file formats rather than proprietary formats increases the sustainability of the digital files [3]. Correspondingly, using non-proprietary software may improve the sustainability of the software--either through in-house expertise or through the open source community. Limiting our digital library software exclusively to open source software or to software developed in-house has not been feasible. For example, we have used proprietary operating systems, scanning software, a search engine, and office productivity software. We did this when either lack of essential capabilities or the cost-benefit trade-off favored using proprietary software. We also did so knowing that in the future we would need to replace or upgrade some of our proprietary software, analogous to migrating from an obsolete digital file format to a new format as the technological landscape changes. Since our digital library's start in 1998, all of its software has been upgraded or replaced, but the digitized items have not yet required migration to other formats. Technological changes that compelled us to replace proprietary software included the cost of product licensing, product support, incompatibility with other software, prohibited use due to evolving security policies, and product abandonment. Sometimes these changes happen on short notice, so we continually monitor our library's software for signs of endangerment. We have attempted to replace proprietary software with suitable in-house or open source software. When the replacement involves a standalone piece of software with a nearly equivalent version, such as replacing a commercial HTTP server with an open source HTTP server, the replacement is straightforward. Recently we replaced software that functioned not only as our search engine but also as the backbone of the architecture of our Web site. In this paper, we describe the lessons learned and the pros and cons of replacing this software with open source software.

49 CFR 1104.3 - Copies.

Code of Federal Regulations, 2011 CFR

2011-10-01

... Microsoft Open Database Connectivity (ODBC) standard. ODBC is a Windows technology that allows a database software package to import data from a database created using a different software package. We currently...-compatible format. All databases must be supported with adequate documentation on data attributes, SQL...
Mobile Source Observation Database (MSOD)

EPA Pesticide Factsheets

The Mobile Source Observation Database (MSOD) is a relational database being developed by the Assessment and Standards Division (ASD) of the US Environmental Protection Agency Office of Transportation and Air Quality (formerly the Office of Mobile Sources). The MSOD contains emission test data from in-use mobile air- pollution sources such as cars, trucks, and engines from trucks and nonroad vehicles. Data in the database was collected from 1982 to the present. The data is intended to be representative of in-use vehicle emissions in the United States.
Bioinformatics Pertinent to Lipid Analysis in Biological Samples.

PubMed

Ma, Justin; Arbelo, Ulises; Guerra, Yenifer; Aribindi, Katyayini; Bhattacharya, Sanjoy K; Pelaez, Daniel

2017-01-01

Electrospray ionization mass spectrometry has revolutionized the way lipids are studied. In this work, we present a tutorial for analyzing class-specific lipid spectra obtained from a triple quadrupole mass spectrometer. The open-source software MZmine 2.21 is used, coupled with LIPID MAPS databases. Here, we describe the steps for lipid identification, ratiometric quantification, and briefly address the differences to the analyses when using direct infusion versus tandem liquid chromatography-mass spectrometry (LC-MS). We also provide a tutorial and equations for quantification of lipid amounts using synthetic lipid standards and normalization to a protein amount.
IgRepertoireConstructor: a novel algorithm for antibody repertoire construction and immunoproteogenomics analysis.

PubMed

Safonova, Yana; Bonissone, Stefano; Kurpilyansky, Eugene; Starostina, Ekaterina; Lapidus, Alla; Stinson, Jeremy; DePalatis, Laura; Sandoval, Wendy; Lill, Jennie; Pevzner, Pavel A

2015-06-15

The analysis of concentrations of circulating antibodies in serum (antibody repertoire) is a fundamental, yet poorly studied, problem in immunoinformatics. The two current approaches to the analysis of antibody repertoires [next generation sequencing (NGS) and mass spectrometry (MS)] present difficult computational challenges since antibodies are not directly encoded in the germline but are extensively diversified by somatic recombination and hypermutations. Therefore, the protein database required for the interpretation of spectra from circulating antibodies is custom for each individual. Although such a database can be constructed via NGS, the reads generated by NGS are error-prone and even a single nucleotide error precludes identification of a peptide by the standard proteomics tools. Here, we present the IgRepertoireConstructor algorithm that performs error-correction of immunosequencing reads and uses mass spectra to validate the constructed antibody repertoires. IgRepertoireConstructor is open source and freely available as a C++ and Python program running on all Unix-compatible platforms. The source code is available from http://bioinf.spbau.ru/igtools. ppevzner@ucsd.edu Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press.
R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring.

PubMed

Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

2016-01-01

Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom. Database URL: http://www.rsyst.inra.fr/. © The Author(s) 2016. Published by Oxford University Press.
R-Syst::diatom: an open-access and curated barcode database for diatoms and freshwater monitoring

PubMed Central

Rimet, Frédéric; Chaumeil, Philippe; Keck, François; Kermarrec, Lenaïg; Vasselon, Valentin; Kahlert, Maria; Franc, Alain; Bouchez, Agnès

2016-01-01

Diatoms are micro-algal indicators of freshwater pollution. Current standardized methodologies are based on microscopic determinations, which is time consuming and prone to identification uncertainties. The use of DNA-barcoding has been proposed as a way to avoid these flaws. Combining barcoding with next-generation sequencing enables collection of a large quantity of barcodes from natural samples. These barcodes are identified as certain diatom taxa by comparing the sequences to a reference barcoding library using algorithms. Proof of concept was recently demonstrated for synthetic and natural communities and underlined the importance of the quality of this reference library. We present an open-access and curated reference barcoding database for diatoms, called R-Syst::diatom, developed in the framework of R-Syst, the network of systematic supported by INRA (French National Institute for Agricultural Research), see http://www.rsyst.inra.fr/en. R-Syst::diatom links DNA-barcodes to their taxonomical identifications, and is dedicated to identify barcodes from natural samples. The data come from two sources, a culture collection of freshwater algae maintained in INRA in which new strains are regularly deposited and barcoded and from the NCBI (National Center for Biotechnology Information) nucleotide database. Two kinds of barcodes were chosen to support the database: 18S (18S ribosomal RNA) and rbcL (Ribulose-1,5-bisphosphate carboxylase/oxygenase), because of their efficiency. Data are curated using innovative (Declic) and classical bioinformatic tools (Blast, classical phylogenies) and up-to-date taxonomy (Catalogues and peer reviewed papers). Every 6 months R-Syst::diatom is updated. The database is available through the R-Syst microalgae website (http://www.rsyst.inra.fr/) and a platform dedicated to next-generation sequencing data analysis, virtual_BiodiversityL@b (https://galaxy-pgtp.pierroton.inra.fr/). We present here the content of the library regarding the number of barcodes and diatom taxa. In addition to these information, morphological features (e.g. biovolumes, chloroplasts…), life-forms (mobility, colony-type) or ecological features (taxa preferenda to pollution) are indicated in R-Syst::diatom. Database URL: http://www.rsyst.inra.fr/ PMID:26989149
Data Collection, Collaboration, Analysis, and Publication Using the Open Data Repository's (ODR) Data Publisher

NASA Astrophysics Data System (ADS)

Lafuente, B.; Stone, N.; Bristow, T.; Keller, R. M.; Blake, D. F.; Downs, R. T.; Pires, A.; Dateo, C. E.; Fonda, M.

2017-12-01

In development for nearly four years, the Open Data Repository's (ODR) Data Publisher software has become a useful tool for researchers' data needs. Data Publisher facilitates the creation of customized databases with flexible permission sets that allow researchers to share data collaboratively while improving data discovery and maintaining ownership rights. The open source software provides an end-to-end solution from collection to final repository publication. A web-based interface allows researchers to enter data, view data, and conduct analysis using any programming language supported by JupyterHub (http://www.jupyterhub.org). This toolset makes it possible for a researcher to store and manipulate their data in the cloud from any internet capable device. Data can be embargoed in the system until a date selected by the researcher. For instance, open publication can be set to a date that coincides with publication of data analysis in a third party journal. In conjunction with teams at NASA Ames and the University of Arizona, a number of pilot studies are being conducted to guide the software development so that it allows them to publish and share their data. These pilots include (1) the Astrobiology Habitable Environments Database (AHED), a central searchable repository designed to promote and facilitate the integration and sharing of all the data generated by the diverse disciplines in astrobiology; (2) a database containing the raw and derived data products from the CheMin instrument on the MSL rover Curiosity (http://odr.io/CheMin), featuring a versatile graphing system, instructions and analytical tools to process the data, and a capability to download data in different formats; and (3) the Mineral Evolution project, which by correlating the diversity of mineral species with their ages, localities, and other measurable properties aims to understand how the episodes of planetary accretion and differentiation, plate tectonics, and origin of life lead to a selective evolution of mineral species through changes in temperature, pressure, and composition. Ongoing development will complete integration of third party meta-data standards and publishing data to the semantic web. This project is supported by the Science-Enabling Research Activity (SERA) and NASA NNX11AP82A, MSL.
Vertical distribution of chlorophyll a concentration and phytoplankton community composition from in situ fluorescence profiles: a first database for the global ocean

NASA Astrophysics Data System (ADS)

Sauzède, R.; Lavigne, H.; Claustre, H.; Uitz, J.; Schmechtig, C.; D'Ortenzio, F.; Guinet, C.; Pesant, S.

2015-04-01

In vivo chlorophyll a fluorescence is a proxy of chlorophyll a concentration, and is one of the most frequently measured biogeochemical properties in the ocean. Thousands of profiles are available from historical databases and the integration of fluorescence sensors to autonomous platforms led to a significant increase of chlorophyll fluorescence profile acquisition. To our knowledge, this important source of environmental data has not yet been included in global analyses. A total of 268 127 chlorophyll fluorescence profiles from several databases as well as published and unpublished individual sources were compiled. Following a robust quality control procedure detailed in the present paper, about 49 000 chlorophyll fluorescence profiles were converted in phytoplankton biomass (i.e. chlorophyll a concentration) and size-based community composition (i.e. microphytoplankton, nanophytoplankton and picophytoplankton), using a~method specifically developed to harmonize fluorescence profiles from diverse sources. The data span over five decades from 1958 to 2015, including observations from all major oceanic basins and all seasons, and depths ranging from surface to a median maximum sampling depth of around 700 m. Global maps of chlorophyll a concentration and phytoplankton community composition are presented here for the first time. Monthly climatologies were computed for three of Longhurst's ecological provinces in order to exemplify the potential use of the data product. Original data sets (raw fluorescence profiles) as well as calibrated profiles of phytoplankton biomass and community composition are available in open access at PANGAEA, Data Publisher for Earth and Environmental Science. Raw fluorescence profiles: http://doi.pangaea.de/10.1594/PANGAEA.844212 and Phytoplankton biomass and community composition: http://doi.pangaea.de/10.1594/PANGAEA.844485.
The world’s user-generated road map is more than 80% complete

PubMed Central

2017-01-01

OpenStreetMap, a crowdsourced geographic database, provides the only global-level, openly licensed source of geospatial road data, and the only national-level source in many countries. However, researchers, policy makers, and citizens who want to make use of OpenStreetMap (OSM) have little information about whether it can be relied upon in a particular geographic setting. In this paper, we use two complementary, independent methods to assess the completeness of OSM road data in each country in the world. First, we undertake a visual assessment of OSM data against satellite imagery, which provides the input for estimates based on a multilevel regression and poststratification model. Second, we fit sigmoid curves to the cumulative length of contributions, and use them to estimate the saturation level for each country. Both techniques may have more general use for assessing the development and saturation of crowd-sourced data. Our results show that in many places, researchers and policymakers can rely on the completeness of OSM, or will soon be able to do so. We find (i) that globally, OSM is ∼83% complete, and more than 40% of countries—including several in the developing world—have a fully mapped street network; (ii) that well-governed countries with good Internet access tend to be more complete, and that completeness has a U-shaped relationship with population density—both sparsely populated areas and dense cities are the best mapped; and (iii) that existing global datasets used by the World Bank undercount roads by more than 30%. PMID:28797037
Duke Surgery Patient Safety: an open-source application for anonymous reporting of adverse and near-miss surgical events

PubMed Central

Pietrobon, Ricardo; Lima, Raquel; Shah, Anand; Jacobs, Danny O; Harker, Matthew; McCready, Mariana; Martins, Henrique; Richardson, William

2007-01-01

Background Studies have shown that 4% of hospitalized patients suffer from an adverse event caused by the medical treatment administered. Some institutions have created systems to encourage medical workers to report these adverse events. However, these systems often prove to be inadequate and/or ineffective for reviewing the data collected and improving the outcomes in patient safety. Objective To describe the Web-application Duke Surgery Patient Safety, designed for the anonymous reporting of adverse and near-miss events as well as scheduled reporting to surgeons and hospital administration. Software architecture DSPS was developed primarily using Java language running on a Tomcat server and with MySQL database as its backend. Results Formal and field usability tests were used to aid in development of DSPS. Extensive experience with DSPS at our institution indicate that DSPS is easy to learn and use, has good speed, provides needed functionality, and is well received by both adverse-event reporters and administrators. Discussion This is the first description of an open-source application for reporting patient safety, which allows the distribution of the application to other institutions in addition for its ability to adapt to the needs of different departments. DSPS provides a mechanism for anonymous reporting of adverse events and helps to administer Patient Safety initiatives. Conclusion The modifiable framework of DSPS allows adherence to evolving national data standards. The open-source design of DSPS permits surgical departments with existing reporting mechanisms to integrate them with DSPS. The DSPS application is distributed under the GNU General Public License. PMID:17472749
Duke Surgery Patient Safety: an open-source application for anonymous reporting of adverse and near-miss surgical events.

PubMed

Pietrobon, Ricardo; Lima, Raquel; Shah, Anand; Jacobs, Danny O; Harker, Matthew; McCready, Mariana; Martins, Henrique; Richardson, William

2007-05-01

Studies have shown that 4% of hospitalized patients suffer from an adverse event caused by the medical treatment administered. Some institutions have created systems to encourage medical workers to report these adverse events. However, these systems often prove to be inadequate and/or ineffective for reviewing the data collected and improving the outcomes in patient safety. To describe the Web-application Duke Surgery Patient Safety, designed for the anonymous reporting of adverse and near-miss events as well as scheduled reporting to surgeons and hospital administration. SOFTWARE ARCHITECTURE: DSPS was developed primarily using Java language running on a Tomcat server and with MySQL database as its backend. Formal and field usability tests were used to aid in development of DSPS. Extensive experience with DSPS at our institution indicate that DSPS is easy to learn and use, has good speed, provides needed functionality, and is well received by both adverse-event reporters and administrators. This is the first description of an open-source application for reporting patient safety, which allows the distribution of the application to other institutions in addition for its ability to adapt to the needs of different departments. DSPS provides a mechanism for anonymous reporting of adverse events and helps to administer Patient Safety initiatives. The modifiable framework of DSPS allows adherence to evolving national data standards. The open-source design of DSPS permits surgical departments with existing reporting mechanisms to integrate them with DSPS. The DSPS application is distributed under the GNU General Public License.
The nature of creativity: The roles of genetic factors, personality traits, cognitive abilities, and environmental sources.

PubMed

Kandler, Christian; Riemann, Rainer; Angleitner, Alois; Spinath, Frank M; Borkenau, Peter; Penke, Lars

2016-08-01

This multitrait multimethod twin study examined the structure and sources of individual differences in creativity. According to different theoretical and metrological perspectives, as well as suggestions based on previous research, we expected 2 aspects of individual differences, which can be described as perceived creativity and creative test performance. We hypothesized that perceived creativity, reflecting typical creative thinking and behavior, should be linked to specific personality traits, whereas test creativity, reflecting maximum task-related creative performance, should show specific associations with cognitive abilities. Moreover, we tested whether genetic variance in intelligence and personality traits account for the genetic component of creativity. Multiple-rater and multimethod data (self- and peer reports, observer ratings, and test scores) from 2 German twin studies-the Bielefeld Longitudinal Study of Adult Twins and the German Observational Study of Adult Twins-were analyzed. Confirmatory factor analyses yielded the expected 2 correlated aspects of creativity. Perceived creativity showed links to openness to experience and extraversion, whereas tested figural creativity was associated with intelligence and also with openness. Multivariate behavioral genetic analyses indicated that the heritability of tested figural creativity could be accounted for by the genetic component of intelligence and openness, whereas a substantial genetic component in perceived creativity could not be explained. A primary source of individual differences in creativity was due to environmental influences, even after controlling for random error and method variance. The findings are discussed in terms of the multifaceted nature and construct validity of creativity as an individual characteristic. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Security of patient data when decommissioning ultrasound systems

PubMed Central

2017-01-01

Background Although ultrasound systems generally archive to Picture Archiving and Communication Systems (PACS), their archiving workflow typically involves storage to an internal hard disk before data are transferred onwards. Deleting records from the local system will delete entries in the database and from the file allocation table or equivalent but, as with a PC, files can be recovered. Great care is taken with disposal of media from a healthcare organisation to prevent data breaches, but ultrasound systems are routinely returned to lease companies, sold on or donated to third parties without such controls. Methods In this project, five methods of hard disk erasure were tested on nine ultrasound systems being decommissioned: the system’s own delete function; full reinstallation of system software; the manufacturer’s own disk wiping service; open source disk wiping software for full and just blank space erasure. Attempts were then made to recover data using open source recovery tools. Results All methods deleted patient data as viewable from the ultrasound system and from browsing the disk from a PC. However, patient identifiable data (PID) could be recovered following the system’s own deletion and the reinstallation methods. No PID could be recovered after using the manufacturer’s wiping service or the open source wiping software. Conclusion The typical method of reinstalling an ultrasound system’s software may not prevent PID from being recovered. When transferring ownership, care should be taken that an ultrasound system’s hard disk has been wiped to a sufficient level, particularly if the scanner is to be returned with approved parts and in a fully working state. PMID:28228821
Freshwater Biological Traits Database (Data Sources)

EPA Science Inventory

When EPA release the final report, Freshwater Biological Traits Database, it referenced numerous data sources that are included below. The Traits Database report covers the development of a database of freshwater biological traits with additional traits that are relevan...
LAND-deFeND - An innovative database structure for landslides and floods and their consequences.

PubMed

Napolitano, Elisabetta; Marchesini, Ivan; Salvati, Paola; Donnini, Marco; Bianchi, Cinzia; Guzzetti, Fausto

2018-02-01

Information on historical landslides and floods - collectively called "geo-hydrological hazards - is key to understand the complex dynamics of the events, to estimate the temporal and spatial frequency of damaging events, and to quantify their impact. A number of databases on geo-hydrological hazards and their consequences have been developed worldwide at different geographical and temporal scales. Of the few available database structures that can handle information on both landslides and floods some are outdated and others were not designed to store, organize, and manage information on single phenomena or on the type and monetary value of the damages and the remediation actions. Here, we present the LANDslides and Floods National Database (LAND-deFeND), a new database structure able to store, organize, and manage in a single digital structure spatial information collected from various sources with different accuracy. In designing LAND-deFeND, we defined four groups of entities, namely: nature-related, human-related, geospatial-related, and information-source-related entities that collectively can describe fully the geo-hydrological hazards and their consequences. In LAND-deFeND, the main entities are the nature-related entities, encompassing: (i) the "phenomenon", a single landslide or local inundation, (ii) the "event", which represent the ensemble of the inundations and/or landslides occurred in a conventional geographical area in a limited period, and (iii) the "trigger", which is the meteo-climatic or seismic cause (trigger) of the geo-hydrological hazards. LAND-deFeND maintains the relations between the nature-related entities and the human-related entities even where the information is missing partially. The physical model of the LAND-deFeND contains 32 tables, including nine input tables, 21 dictionary tables, and two association tables, and ten views, including specific views that make the database structure compliant with the EC INSPIRE and the Floods Directives. The LAND-deFeND database structure is open, and freely available from http://geomorphology.irpi.cnr.it/tools. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Traditional Medicine Collection Tracking System (TM-CTS): a database for ethnobotanically driven drug-discovery programs.

PubMed

Harris, Eric S J; Erickson, Sean D; Tolopko, Andrew N; Cao, Shugeng; Craycroft, Jane A; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E; Eisenberg, David M

2011-05-17

Ethnobotanically driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically driven natural product collection and drug-discovery programs. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Traditional Medicine Collection Tracking System (TM-CTS): A Database for Ethnobotanically-Driven Drug-Discovery Programs

PubMed Central

Harris, Eric S. J.; Erickson, Sean D.; Tolopko, Andrew N.; Cao, Shugeng; Craycroft, Jane A.; Scholten, Robert; Fu, Yanling; Wang, Wenquan; Liu, Yong; Zhao, Zhongzhen; Clardy, Jon; Shamu, Caroline E.; Eisenberg, David M.

2011-01-01

Aim of the study. Ethnobotanically-driven drug-discovery programs include data related to many aspects of the preparation of botanical medicines, from initial plant collection to chemical extraction and fractionation. The Traditional Medicine-Collection Tracking System (TM-CTS) was created to organize and store data of this type for an international collaborative project involving the systematic evaluation of commonly used Traditional Chinese Medicinal plants. Materials and Methods. The system was developed using domain-driven design techniques, and is implemented using Java, Hibernate, PostgreSQL, Business Intelligence and Reporting Tools (BIRT), and Apache Tomcat. Results. The TM-CTS relational database schema contains over 70 data types, comprising over 500 data fields. The system incorporates a number of unique features that are useful in the context of ethnobotanical projects such as support for information about botanical collection, method of processing, quality tests for plants with existing pharmacopoeia standards, chemical extraction and fractionation, and historical uses of the plants. The database also accommodates data provided in multiple languages and integration with a database system built to support high throughput screening based drug discovery efforts. It is accessed via a web-based application that provides extensive, multi-format reporting capabilities. Conclusions. This new database system was designed to support a project evaluating the bioactivity of Chinese medicinal plants. The software used to create the database is open source, freely available, and could potentially be applied to other ethnobotanically-driven natural product collection and drug-discovery programs. PMID:21420479
BioBarcode: a general DNA barcoding database and server platform for Asian biodiversity resources.

PubMed

Lim, Jeongheui; Kim, Sang-Yoon; Kim, Sungmin; Eo, Hae-Seok; Kim, Chang-Bae; Paek, Woon Kee; Kim, Won; Bhak, Jong

2009-12-03

DNA barcoding provides a rapid, accurate, and standardized method for species-level identification using short DNA sequences. Such a standardized identification method is useful for mapping all the species on Earth, particularly when DNA sequencing technology is cheaply available. There are many nations in Asia with many biodiversity resources that need to be mapped and registered in databases. We have built a general DNA barcode data processing system, BioBarcode, with open source software - which is a general purpose database and server. It uses mySQL RDBMS 5.0, BLAST2, and Apache httpd server. An exemplary database of BioBarcode has around 11,300 specimen entries (including GenBank data) and registers the biological species to map their genetic relationships. The BioBarcode database contains a chromatogram viewer which improves the performance in DNA sequence analyses. Asia has a very high degree of biodiversity and the BioBarcode database server system aims to provide an efficient bioinformatics protocol that can be freely used by Asian researchers and research organizations interested in DNA barcoding. The BioBarcode promotes the rapid acquisition of biological species DNA sequence data that meet global standards by providing specialized services, and provides useful tools that will make barcoding cheaper and faster in the biodiversity community such as standardization, depository, management, and analysis of DNA barcode data. The system can be downloaded upon request, and an exemplary server has been constructed with which to build an Asian biodiversity system http://www.asianbarcode.org.
The Microbe Directory: An annotated, searchable inventory of microbes' characteristics.

PubMed

Shaaban, Heba; Westfall, David A; Mohammad, Rawhi; Danko, David; Bezdan, Daniela; Afshinnekoo, Ebrahim; Segata, Nicola; Mason, Christopher E

2018-01-05

The Microbe Directory is a collective research effort to profile and annotate more than 7,500 unique microbial species from the MetaPhlAn2 database that includes bacteria, archaea, viruses, fungi, and protozoa. By collecting and summarizing data on various microbes' characteristics, the project comprises a database that can be used downstream of large-scale metagenomic taxonomic analyses, allowing one to interpret and explore their taxonomic classifications to have a deeper understanding of the microbial ecosystem they are studying. Such characteristics include, but are not limited to: optimal pH, optimal temperature, Gram stain, biofilm-formation, spore-formation, antimicrobial resistance, and COGEM class risk rating. The database has been manually curated by trained student-researchers from Weill Cornell Medicine and CUNY-Hunter College, and its analysis remains an ongoing effort with open-source capabilities so others can contribute. Available in SQL, JSON, and CSV (i.e. Excel) formats, the Microbe Directory can be queried for the aforementioned parameters by a microorganism's taxonomy. In addition to the raw database, The Microbe Directory has an online counterpart ( https://microbe.directory/) that provides a user-friendly interface for storage, retrieval, and analysis into which other microbial database projects could be incorporated. The Microbe Directory was primarily designed to serve as a resource for researchers conducting metagenomic analyses, but its online web interface should also prove useful to any individual who wishes to learn more about any particular microbe.
Meet Spinky: An Open-Source Spindle and K-Complex Detection Toolbox Validated on the Open-Access Montreal Archive of Sleep Studies (MASS).

PubMed

Lajnef, Tarek; O'Reilly, Christian; Combrisson, Etienne; Chaibi, Sahbi; Eichenlaub, Jean-Baptiste; Ruby, Perrine M; Aguera, Pierre-Emmanuel; Samet, Mounir; Kachouri, Abdennaceur; Frenette, Sonia; Carrier, Julie; Jerbi, Karim

2017-01-01

Sleep spindles and K-complexes are among the most prominent micro-events observed in electroencephalographic (EEG) recordings during sleep. These EEG microstructures are thought to be hallmarks of sleep-related cognitive processes. Although tedious and time-consuming, their identification and quantification is important for sleep studies in both healthy subjects and patients with sleep disorders. Therefore, procedures for automatic detection of spindles and K-complexes could provide valuable assistance to researchers and clinicians in the field. Recently, we proposed a framework for joint spindle and K-complex detection (Lajnef et al., 2015a) based on a Tunable Q-factor Wavelet Transform (TQWT; Selesnick, 2011a) and morphological component analysis (MCA). Using a wide range of performance metrics, the present article provides critical validation and benchmarking of the proposed approach by applying it to open-access EEG data from the Montreal Archive of Sleep Studies (MASS; O'Reilly et al., 2014). Importantly, the obtained scores were compared to alternative methods that were previously tested on the same database. With respect to spindle detection, our method achieved higher performance than most of the alternative methods. This was corroborated with statistic tests that took into account both sensitivity and precision (i.e., Matthew's coefficient of correlation (MCC), F1, Cohen κ). Our proposed method has been made available to the community via an open-source tool named Spinky (for spindle and K-complex detection). Thanks to a GUI implementation and access to Matlab and Python resources, Spinky is expected to contribute to an open-science approach that will enhance replicability and reliable comparisons of classifier performances for the detection of sleep EEG microstructure in both healthy and patient populations.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.